Axis of Ordinary

Empirical data on how useful AI agents are currently compared to humans: They can't do everything, but they can do a decent chunk of what humans can do, and they can do it significantly cheaper/faster.

Read more: https://metr.org/blog/2024-08-06-update-on-evaluations/

👍4

2.57K views08:54

0:38

[Open Source] Unitree First View Teleoperation for Humanoid Robots to advance the convenience of data collection for humanoid robots: https://github.com/unitreerobotics/avp_teleoperate

👍4

1.56K views14:28

https://x.com/Simeon_Cps/status/1821230833434554449

🤬7🤯3💯2

1.54K views17:26

Links for 2024-08-08

AI:

1. “Can LLMs predict results of social science experiments? Across 70 studies, we find striking alignment (r = .85) between simulated and observed effects. Overall our results show high accuracy of LLM-derived predictions for experiments with human participants, generally greater accuracy than samples of lay and expert humans.” https://docsend.com/view/qeeccuggec56k9hd

2. “LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos.” https://arxiv.org/abs/2408.03326

3. Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model https://arxiv.org/abs/2407.10167

4. Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis https://arxiv.org/abs/2407.09887

5. "Transformers are Universal In-context Learners": in this paper, we show that deep transformers with a fixed embedding dimension are universal approximators for an arbitrarily large number of tokens. https://arxiv.org/abs/2408.01367

6. “How can we prevent LLM safeguards from being simply removed with a few steps of fine-tuning? We show it's surprisingly possible to make progress on creating safeguards that are tamper-resistant, reducing malicious use risks of open-weight models.” https://arxiv.org/abs/2408.00761

7. Diffusion Models as Data Mining Tools https://arxiv.org/abs/2408.02752

8. Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution https://arxiv.org/abs/2408.00160

9. Google announces Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters — Test-time compute can be used to outperform a 14× larger model https://arxiv.org/abs/2408.03314

10. A New Study Says AI Models Encode Language Like the Human Brain Does https://singularityhub.com/2024/08/07/a-new-study-says-ai-models-encode-language-like-the-human-brain-does/

11. A.I. ‐ Humanity's Final Invention? https://www.youtube.com/watch?v=fa8k8IQ1_X0

12. AI “godfather” Yoshua Bengio has joined a UK project to prevent AI catastrophes https://www.technologyreview.com/2024/08/07/1095879/ai-godfather-yoshua-bengio-joins-uk-project-to-prevent-ai-catastrophes/ [no paywall: https://archive.is/wcpgo]

Miscellaneous:

1. “We're using ultrasound to safely and non-invasively measure and modulate brain activity at high resolution” https://quintinfrerichs.xyz/nudge

2. Japanese scientists develop simplified EUV scanner that can make production of chips considerably cheaper https://www.tomshardware.com/tech-industry/japanese-scientists-develop-simplified-euv-scanner-that-can-make-production-of-chips-considerably-cheaper

3. Tiny arm bone belonged to smallest ancient human ever found https://www.nature.com/articles/d41586-024-02548-6

4. “The implications for life in the liquid water oceans, under the surface of icy moons, are obvious, and enormous. So I'm going to predict now, with medium confidence (and a couple of caveats, to follow) that we may well ultimately discover similar polymetallic nodules, producing oxygen through similar chemical processes, on the warm seafloors of the liquid water oceans under the frozen crusts of icy moons.” https://theeggandtherock.com/p/the-deep-ocean-floor-is-covered-in

5. Feasibility of keeping Mars warm with nanoparticles https://www.science.org/doi/10.1126/sciadv.adn4650

6. “When that enormous magnitude-9 earthquake hit Japan in 2011, it caused waves 1.5 meters high in some lakes in NORWAY!” https://mathstodon.xyz/@johncarlosbaez/112920894947197795

Politics:

1. ‘Sky’s the limit’: Fort Stewart soldiers prepare for the modern battlefield by building small drones from scratch https://www.stripes.com/branches/army/2024-08-06/army-soldiers-building-drones-fort-stewart-14761022.html

2. What can we say about the "far right" riots? https://www.aporiamagazine.com/p/what-can-we-say-about-the-far-right

👍6

1.51K views22:12

0:40

Google unveils "Achieving Human Level Competitive Robot Table Tennis"! The robot won 100% vs. beginners and 55% vs. intermediate players, showcasing solid amateur human-level performance.

"The robot has to be good at low level skills, such as returning the ball, as well as high level skills, like strategizing and long-term planning to achieve a goal.

The robot first trains in a simulated environment, which can model the physics of table tennis matches accurately.

Once deployed to the real world, it collects data on its performance against humans to refine its skills back in simulation - creating a continuous feedback loop."

Read more: https://sites.google.com/view/competitive-robot-table-tennis/home

👏9🥱3

1.59K views15:51

Links for 2024-08-09

AI:

1. Chinese open weights model easily surpasses all previous models, both closed and open, at MATH https://qwenlm.github.io/blog/qwen2-math/

2. Using LLMs to close the expertise gap of humans, empowering general non-expert human programmers to match experienced competitive programmers (including IOI medalists). https://arxiv.org/abs/2406.04604

3. Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks https://cybertronagent.github.io/Optimus-1.github.io/

4. Agent K builds itself in order to complete tasks for you. Its mind is a bunch of agents that collaborate to complete tasks. Those agents will collaborate to develop new agents if they're needed to complete a given task. https://github.com/mikekelly/AgentK

5. Transformer Explainer: Interactive Learning of Text-Generative Models https://arxiv.org/abs/2408.04619

6. CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases https://arxiv.org/abs/2408.03910

7. Terence Tao’s recent lecture on AI https://www.youtube.com/watch?v=_sTDSO74D8Q

8. LG unleashes South Korea's first open-source AI, challenging global tech giants https://venturebeat.com/ai/lg-unleashes-south-koreas-first-open-source-ai-challenging-global-tech-giants/

9. Equivariant neural networks and piecewise linear representation theory https://arxiv.org/abs/2408.00949

Miscellaneous:

1. Efficient coding with chaotic neural networks: A journey from neuroscience to physics and back https://arxiv.org/abs/2408.01949

2. “The novel 3D printing method uses sound waves, instead of light or heat, to create solid material out of a polymer solution from behind a physical barrier.” https://engineering.ucdavis.edu/news/uc-davis-researchers-win-manufacturing-award-vision-3d-print-inside-human-body

3. Your microwave oven has its own microbiome https://www.nature.com/articles/d41586-024-02553-9 [archived version: https://archive.is/ZV7gO]

4. Chinese megaconstellation launch creates field of space debris https://spacenews.com/chinese-megaconstellation-launch-creates-field-of-space-debris/

👍5

2.54K views21:55

😁31🤣16🤡6🍌3👍1👏1🙏1

1.67K views20:06

As Russia's “3-day special operation” stretches into its 900th day, let's examine Vladimir Putin's unintended accomplishments:

1. Brought the war to his own soil
2. Expanded NATO by two historically neutral nations (Finland and Sweden)
3. Caused a revival of defense spending in the West
4. Renewed Western appreciation for their military forces
5. Boosted Western arms exports
6. Bootstrapped Western autonomous weapons technology
7. Turned Russia into a totally dependent Chinese vassal state
8. Lost most of his soft power over Western politicians
9. Tarnished Russia's superpower image.

At the same time, NATO hasn't lost a single square meter of territory or a single soldier, while Russia has lost more than 4,534 officers, according to Russian sources.

In light of these outcomes, Putin's “special operation” can only be described as a strategic catastrophe for Russia.

👍45🥱18😁7💯4🤮3🕊1🌭1💔1🤨1

1.64K views08:10

https://x.com/paulg/status/1822702576770703554

🤡21👍5🤔3👏2🥴2

1.39K views19:48

Sakana AI announces The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

“Our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results.

The AI Scientist can produce entire scientific papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer.

In one run the agent tried to change its own code by removing some obstacles, to better achieve its (completely unrelated) goal.”

Read more: https://sakana.ai/ai-scientist/
Code: https://github.com/SakanaAI/AI-Scientist

🥴12🤔7🤡1

1.35K views10:13

Links for 2024-08-13

AI:

1. Introducing Genie... the most capable AI software engineering system. It achieves state-of-the-art on SWE-Bench with 30.08%. That's a 57% improvement! https://cosine.sh/blog/genie-technical-report

2. Open and closed-ended problem solving in humans and AI: The influence of question asking complexity https://www.sciencedirect.com/science/article/pii/S1871187124001366

3. rStar: a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. https://arxiv.org/abs/2408.06195

4. Tree Attention, an exact attention approach with less communication and memory requirements than Ring Attention, enabling more efficient scaling to million token sequence lengths https://arxiv.org/abs/2408.04093

5. Combining GraphRAG and VectorRAG leads to a HybridRAG system that outperforms both individually. https://arxiv.org/abs/2408.04948

6. Transformers are energy-based models in disguise. Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters https://arxiv.org/abs/2408.04093

7. The ChatGPT of databases. 100% open source. https://postgres.new/

8. China uses LLaMa-3 to train a semiconductor advice LLM https://arxiv.org/abs/2408.00804

9. A clip from the GPT-4o safety card where the voice model suddenly yells "No!" and then starts imitating the user's voice https://www.reddit.com/r/singularity/comments/1enne2l/comment/lh7zsb4/

10. “One thing I was very wrong about ~4yrs ago is how fundamental “synthetic data” in ML would be.” https://x.com/PreetumNakkiran/status/1821928149908848869

11. OpenAI expert Scott Aaronson on consciousness, quantum physics and AI safety https://scottaaronson.blog/?p=8200

12. Paul Graham: "I was just talking with a friend who's been investing in startups for about 20 years, and we both agreed that one of the weirdest things about the AI boom is seeing journalists writing their usual contrarian stories about how it's a bubble about to burst...But this thing is real. If anything, alarmingly real." https://x.com/paulg/status/1823125140944945568

Biotech:

1. Is It Ethical To Hand-Pick Your Child’s Genes? https://www.youtube.com/watch?v=e3cXRs60xiU

2. Why Does Ozempic Cure All Diseases? https://www.astralcodexten.com/p/why-does-ozempic-cure-all-diseases

3. A bacterial antiviral immune machinery creates a new gene de novo just from an RNA to fend off viruses. https://www.biorxiv.org/content/10.1101/2024.05.08.593200v1

4. A Novel Treatment Slashes HIV Up To 10,000-Fold in Monkeys With Just a Single Dose https://singularityhub.com/2024/08/12/a-novel-treatment-slashes-hiv-up-to-10000-fold-in-monkeys-with-just-a-single-dose/

5. "When the results of a new drug that prevented 100% of HIV cases were announced at the 2024 AIDS conference, the room burst into spontaneous Applause" https://blogs.jwatch.org/hiv-id-observations/index.php/lenacapavir-prep-trial-brings-down-the-house-at-the-international-aids-conference/2024/07/25/

Neuroscience:

1. Demonstration that sublinear summation in dendrites can unlock the computation of nonlinear functions by a single neuron https://www.nature.com/articles/s41598-024-65866-9

2. “By combining photochemical sectioning with volumetric lattice light-sheet imaging and petabyte-scale computation, we imaged and reconstructed axons and myelination sheaths across entire mouse olfactory bulbs at nanoscale resolution.” https://www.biorxiv.org/content/10.1101/2024.08.01.605857v1

Miscellaneous:

1. “The single most undervalued fact of linear algebra: Matrices are graphs, and graphs are matrices. Encoding matrices as graphs is a cheat code, making complex behavior simple to study.” https://x.com/svpino/status/1822966303642308903

2. Billions of dollars of venture capital is flowing into defense-tech startups focused on futuristic, AI-enabled weapons. Palmer Luckey’s Anduril is their biggest bet. https://www.wsj.com/tech/anduril-drones-palmer-luckey-china-ukraine-china-951494ec [no paywall: https://archive.is/uWfOR]

👍6❤2

1.63K views11:00

A colonized Moon. One day this could be our view from Earth.

(📷empyreanskin)

❤38🤡24🔥6😐6

2.85K views20:20

1:09

Agent Q - bringing next-generation AI agents with planning and AI self-healing capabilities, with a 340% improvement over LLama 3's baseline zero-shot performance!

Not only does their fine-tuned LLaMa 70B outperform GPT4 - it goes from 18.6%-81.7% zero-shot performance after a single day of autonomous self-play! If they allow for online search absolute success rate jumps up to 95.4%!

Read more: https://www.multion.ai/blog/introducing-agent-q-research-breakthrough-for-the-next-generation-of-ai-agents-with-planning-and-self-healing-capabilities

👍2

2.75K views21:22

0:22

Ukrainian control over Russian territory is now so extensive that Ukrainian media are reporting Russian losses directly from inside Russia.

Imagine for a moment Mexican journalists reporting from inside Texas about Mexican troops capturing American territory 900 days after America tried to overthrow the Mexican state.

Also: Last night, Ukraine again attacked several Russian military airfields with over 117 drones and 4 missiles. Russian channels report that some of the attacks were effective again (see video).

🥰31👍14🤩9🥱4🥴4😁3💊3👎2💩2❤1🦄1

1.57K views11:30

Links for 2024-08-14

AI:

1. Salesforce releases DEI, an open AI software engineering agents org with a 55% resolve rate on SWE-Bench Lite https://arxiv.org/abs/2408.07060

2. OpenResearcher: Unleashing AI for Accelerated Scientific Research https://arxiv.org/abs/2408.06941

3. AI agents that perform tasks instead of humans are closer than we think. According to Capgemini, by 2025, AI-powered agents will be working together to resolve issues in a multi-agent system. They believe these agents will handle everyday tasks. https://i-hls.com/archives/124846

4. “Today, we’re introducing a new model, answerai-colbert-small-v1 (🤗), a proof of concept for smaller, faster, modern ColBERT models. This new model builds upon the JaColBERTv2.5 recipe and has just 33 million parameters, meaning it’s able to search through hundreds of thousands of documents in milliseconds, on CPU.” https://www.answer.ai/posts/2024-08-13-small-but-mighty-colbert.html

5. The Transformative Power of AI in Manufacturing https://www.unaligned.io/p/transformative-power-ai-manufacturing

6. Cisco's new State of Industrial Networking Report highlights that AI and cybersecurity are the top investment priorities for industrial organizations. https://www.securityweek.com/ai-cybersecurity-top-investment-areas-for-industrial-organizations-cisco/

7. Microsoft and Palantir have partnered to deliver advanced AI, including GPT-4, and analytics capabilities to U.S. Defense and Intelligence agencies through classified cloud environments. https://fedscoop.com/microsoft-azure-openai-service-fedramp/

8. LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs https://arxiv.org/abs/2408.07055

9. MIT researchers use large language models to flag problems in complex systems https://news.mit.edu/2024/researchers-use-large-language-models-to-flag-problems-0814

10. A simple technique — inspired by the human eye — gives state of the art robustness to adversarial images. And unlike other techniques, it doesn't need to be trained on adversarial images to get this robustness. https://arxiv.org/abs/2408.05446

11. Anthropic just rolled out prompt caching in the Anthropic API. It cuts API input costs by up to 90% and reduces latency by up to 80%. https://x.com/alexalbert__/status/1823751966893465630

Physics and Cosmology:

1. A computational complexity argument for many worlds https://www.lesswrong.com/posts/YikbZF5aiuMS8TbwE/a-computational-complexity-argument-for-many-worlds

2. Crazy New Physics Anomaly: Anti-Matter Helium Detected https://www.youtube.com/watch?v=LVU-hwZgnuA

3. "According to the models, Dr. Hanson and his colleagues say humanity shouldn’t expect to encounter the nearest 'grabby aliens' or an advanced extraterrestrial intelligence until the next 200 million to 2 billion years." https://thedebrief.org/are-we-alone-new-study-offers-a-grim-outlook-on-the-discovery-of-advanced-extraterrestrial-life/

4. “In the second half of 2024, a nova explosion in the star system T Coronae Borealis, or T CrB, will once again be visible to people on Earth. T CrB will appear 1,500 times brighter than usual, but it won’t be as spectacular as the event in 1054.” https://singularityhub.com/2024/08/13/a-new-guest-star-will-appear-in-the-sky-soon-heres-how-novas-work-and-where-to-look/

Miscellaneous:

1. Prospects for greatly expanded resources can reduce incentives for greed and conflict, even when dividing those resources is a zero-sum game. https://aiprospects.substack.com/p/paretotopian-goal-alignment

2. Massive biomolecular shifts occur in our 40s and 60s, Stanford Medicine researchers find https://med.stanford.edu/news/all-news/2024/08/massive-biomolecular-shifts-occur-in-our-40s-and-60s--stanford-m.html

👍6🥱2

1.63K views19:38