top of page
"Not Even Wrong" Podcast
Investing in fundamentally new concepts and engineering practices with large impact.

Episode February 19. 2024

Foundation models for embodied decision making agents. Part 5. Discussing two

papers about generative AI and its power to generate data sets. Additional comments 

on Sora to illustrate the potential of generative AI for simulation in robot learning. 1/3 

Solving math problems. Generate

problem/solution tuples with code and then use data set to train model for math problem solving. 2/2 Extreme video data compression by using snippets of original video as prompt to generate rest of video. Key here is a 

learned decision process which strategically excludes large chunks of the original 

video and generates video sequences with generative AI. That way you can compress 

original videos with orders of magnitude more compression than is possible with existing compression techniques.  Take original, compress and then in reconstruction use only parts of the compressed original to generate video sequences with diffusion. 3/3 Nathan Lambert on Sora: Use video generation to create data set for robot training.

Episode February 16. 2024

Foundation models for embodied decision making agents. Part 4. Google 

introduces Gemini 1.5. Multimodal reasoning and long context window. Research 

questions on multimodality: 1/2 How to concatenate different modalities? One large 

embedding or delegation of tasks. 2/2 Mixture of Experts architecture is key to 

multimodality and long context windows.  Long context window enables one shot 

reasoning on complex prompts. Gemini 1.5 can reason from text, video, image.  Some 

examples of Gemini multimodality. 1/3 Show graphs and let model generate 

corresponding code. 2/3 Show football player kicking ball and let model suggest 

improvements. This can be used for automated coaching. 3/3 Show dish and generate 

recipe.  

Episode February 12. 2024

Discussing “ai.economics”, an essay by Ben Spector. Foundation models are like chip industry. Large investment, huge value, rapid depreciation. Tesla can become the X-86 of multimodal foundation models. Application to FSD and robots and other embodied decision making agents. That’s what Musk means when saying Dojo low probability , high return.

Episode February 11. 2024

Book discussion. “Everyone here is Lying” by Shari Lapena. 1/3 Kids’s innocence is a bug not a feature. Society obsessed with kids reflects insecurity of adults. Humans come with agency and accountably. 2/3 The lie is a character in itself impersonated by different people. Lying more difficult than you think. In fact, lying is an oxymoron. There is something akin to the conservation of truth with lying. 3/3 What does it take to write a good novel? A good story. Good stories are built on characters with deep relationships.

Episode February 9. 2024

Event discussion. 1/2 Fusion reaction experiment shows for first time that fusion energy could work in principle. 2/2 Flash attention reduces wall clock time of transformer learning and inference. Ideas with impact. Fusion is illusion because high iteration cost, high error correction cost and long lead times. Flash attention is a great example of how first principles thinking helps solve problems. Find ways to reduce read and write in high bandwidth memory in GPUs with mathematical adjustments.

Episode February 7. 2024 II

Event discussion. Synfuel Symposium at PSI (Paul Scherrer Institute). Methane - Methanol - Olefins - Hydrocarbons. Chemical engineering applied to synthetic fuel production. How to design reactors, what kind of materials? Physics parameters such as temperature, viscosity, pressure, CO2 content etc. matter. Catalyst design. Vladimir Paunovic very interesting presentation. Using various forms of spectroscopy to measure operando drift. Design catalysts for synfuel value chain. Computational methods could be key here. More in silico design would increase rate of iteration and deliver faster results at lower cost. Vladimir’s phd thesis about how to design better catalysts for hydrocarbon production out of natural gas.

Episode February 7. 2024

In silico design, or computational design, refers to using computers and software to design materials, products, molecules, and even processes entirely within the virtual world. Jensen Huang at JPM 24 talks about how chip design is almost exclusively in silico. The same is coming to pharmaceuticals. Apple, Tesla sell software defined products. Tesla is software defined car. Software makes the car, manages the car and drives the car. Faster iteration, better safety, more convenience and lower cost. Moore’s Law driving car and car design. Production and design of Next Gen predominantly in silico.

Episode February 4. 2024

Discussing events of the week. 1/2 GNNs and physics. Taking advantage of symmetries in computer vision. Stuff is when symmetries break. CNNs to transformers progression of models based on similar principles. 2/2 Multimodal text and video masked model. Words have strength and thus provide compact information about scenes. Take advantage of intricacies of text in multimodal. Sum is more than parts.

Episode February 2. 2024

Foundation models for embodied decision making agents. Part 3. Discussing Neural Radiance Fields (Nerfs), Occupancy networks and Transformers.History of computer vision. Supervised learning after image net. Convolutional Neural Nets exploit symmetries. Nerfs manage depth, perspective and precise rendering. Occupancy Networks deliver light weight object detection. Transformers are Nerfs on steroids. Nerfs are good representation through compression. Low compute budget on inference but high training cost. Transformers offer more precision and general inference but high compute budget. Trade off between training budget and inference budget.

Episode February 1. 2024 II

Concluding remarks Neurips 2023. Prepare. Observation is theory laden. The better you’re prepared, the more you get out of it. Creativity is when you define your own problems and intelligence is when you’re able to solve them. Foundation models for embodied decision making agents will propel robotics. Learning from video with reward function design. Generate video through diffusion and teach robots based on observed skills. Experience through offline RL. What does learning actually mean? Memory versus credit. What do I know and which knowledge leads to higher rewards? Inference based on maximum likelihood methods (which distribution might the observed data belong to?) versus score matching (Markov process with sequential decision making). At the limit both methods converge.

 

Episode February 1. 2024

Neurips Part 9 1/5 Learning to Influence Human Behavior with Offline Reinforcement Learning. In a collaborative game you take the other agent’s latent strategy into account when updating your own policy. You learn how to influence the other’s actions to optimize for you given task. 2/5 COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning. This is an attempt to teach a robot how to discover its own problems and solve them. Gather experience through training on offline RL data. 3/5 HIQL: Offline Goal-Conditioned RL with Latent States as Actions Extract value function from observed states and thus create a latent representation of subgoals. Use those subgoals to train a policy to actually reach the goal. Grounded Decoding: Guiding Text Generation with Grounded Models for 4/5 Embodied Agents Use LLMs for high level training and grounding the robot in affordances (what can the robot actually do). Both generate a combined text that is semantically and physically sound. 5/5 Learning Universal Policies via Text-Guided Video Generation. Use youtube videos to train a video diffusion model. Then text prompt instructions to generate a video with specific tasks (kick the ball, arrange blocks by color etc…). Use inverse modeling to extract kinematic and other actuator information and use model based RL to train the robot.

Episode January 31. 2024 II

Foundation models for embodied decision making agents Part 2. 1/3 FSD shadow mode. What is shadow mode, why is it so important for the engineering of FSD. 2/3 End to End training of FSD means single optimization pipeline. Allows for more expression. Creativity. Robot finds its own problems to solve. FSD problem statement is “drive as close to human like as possible. 3/3 Most scarce resource in Tesla AI is compute. That’s a good thing. That means that Moore’s Law is working for FSD.

Episode January 31. 2024

Discussing Tesla Q4 Earnings and thoughts. 1/3 Volume and pricing stabilizing, Margins higher and EPS growth. 2/3 FSD not at inflection point in terms of adoption but all ingredients are in place for FSD to become more accretive to earnings. 3/3 Corporate. We believe Musk should step down as CEO and remain Chief Architect. Focus on exciting projects such as next gen. production technology, FSD, Optimus. Day to day better ran by dedicated people with skin in the game.

Episode January 30. 2024

Discussing “Inventor of the Future” a Buckminster Fuller Biography Spaceship earth. Deutsch meets Jobs, meets Musk, meets Feynman. “Nature makes a huge effort to help us succeed. But nature doesn’t need us. We are not the only experiment.” Structure determines nature and function. Use isolated components and connect them in space so the whole is more stable, robust, flexible and stronger. 3D chemistry. Difference between carbon nanotube and carbon atom. How to scale a large company. Build synergetic pieces that fit together. (Cybertruck helps single cast tech and enables next gen single casting). Silicon Valley mindset of problem solving must be accompanied by problem solving in society, environment and politics.

Episode January 28. 2024

Problem is at the core of knowledge and creation. “A problem well stated offers its own solution”. What is a good problem? What is a well stated problem? Problems must be discovered like everything else. Knowledge starts with problems and wealth is created through the discovery and solution of problems. Better statement for Tesla FSD problem: “Self-driving that feels human.”

 

Episode January 26. 2024

Neurips Part 8  1/4 DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models. Diffusion + Physics. Generate simulation for robots that incorporates geometry and physics. Use Monte Carlo sampling techniques with differentiable physics to replicate real world behavior and train robots and differing tasks while taking physics behavior into account. 2/4 MULTIREACT: MULTIMODAL TOOLS AUGMENTED REASONING-ACTING TRACES FOR EMBODIED AGENT PLANNING Imagine Tesla Optimus seeing a video of clothes folding. The video gets spliced up and each spliced frame gets a language captioning. Now the robot learns how to fold based on this dataset and uses the captioning as measurement for reward function. The text helps evaluate whether the robot is on the right path. 3/4 How to Prompt Your Robot: A PromptBook for Manipulation Skills with Code as Policies. This paper attempts to make LLMs more useful in longterm robotic planning. The key problem here is context length. You can’t tell a robot, please clean my kitchen. But it’s not clear to the robot what that means, in fact, even a human might misunderstand. But if you compartmentalize the problem and use step by step instructions and add information about intermediate states, you can get better results. Think of this like a job description for a robot using LLMs. 4/4 Exploitation-Guided Exploration for Semantic Embodied Navigation. Knowing that there are specialists gives you more confidence it tackling more specific problems. Let the agent explore until it gets close to a landmark, then let it exploit. The separation of explore and exploit should in principle improve exploration and yield bette results.

Episode January 25. 2024

Tesla Q4 2023 Earnings Call discussion: Three drivers of the stock 1/3 FSD monetization. 2/3 Lower cost per mile driving demand 3/3 Robot that builds the robot. Positive: FSD, Next Gen, Energy, No more pushing price to drive volume. Margin stabilization. Interest rates tailwind. Optimus about 18-24 months from first deliveries. Negative: 3/Y demand growth less than expected. China demand/supply imbalance. Lack of clear margin guidance. CFO must improve communication skills, not as smooth as Zach. What is purpose of Tesla chip manufacturing Dojo inference? Not clear.

Episode January 23. 2024

Tesla (SOP) sum of the parts. 420 $. Valuation based on four businesses, car 50%, FSD 35%, Energy 5% and Robot 10%. Market values car business. Nothing for rest. FSD value creation is fundamentally the lowering of cost per mile. Will come down because of lower cost and more miles driven. Robot replaces high industrial labor cost and opens new avenues for scale manufacturing. Ford didn't invent car, he invented assembly line. Musk will go down as not having invented EV, but scale manufacturing of highly complex robots that are built by robots. In principle, Tesla's value depends on how fast they can iterate and improve.

Episode January 22. 2024 II

Discussing Deep Mind's paper: "Solving olympiad geometry without human demonstrations". Iteration and recursive update on policy. Key idea is to generate random math transitions and then recursively cross check the insight from those transitions against the actual problem. If you iterate enough and generate an adequate update rule, you will eventually find a proof. This paper has a similar flavor like FunSearch. Generate random samples in function space and then recursively search for interesting semantics until you find the solution. Intelligence = Iteration and recursive policy update at scale and speed.

Episode January 22. 2024

Neurips Part 7

1/4 Video Prediction Models as Rewards for Reinforcement Learning. Use video data to create a generative video model that generates video. Then compute rewards based on action and transition in video. Thus, build a dataset of action and reward tuples that can be used for offline RL. 2/4 Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration. Our main idea is to prevent the distribution of high-value states from affecting exploration around low-value states, and vice versa, by filtering out states whose value estimates significantly differ from each other for computing the intrinsic bonus. Use states space similarity as intrinsic reward, make sure the agent explores different states. Introduce value associated with states. Make sure, that high value deltas get eliminated. Let the agent explore in the rest of the state, value tuples. 3/4 Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? Can a model be built that generalizes vision to action. This paper studies various models and concludes that so far, there isn’t one. Tesla is working on some model, they call it World Model. Data-Model-Architecture-Hardware all must interplay. 4/4 MAViL: Masked Audio-Video Learners. Idea is to predict raw inputs in heterogeneous modalities, the model with masked-view inputs jointly predicts contextualized audio-video representations in the homogeneous (aligned) latent space.

Episode January 20. 2024

Paper discussion:

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities Pipeline for robot end to end training. Foundation models offer general pre-training with potential applications to diverse robotics domains. Simulation. Nerfs solve for depth. Masked Auto-Encoders for World model. The world model is defined as predicting the future states conditioned on action and past observations. Pipeline = Track, Map, Motion, Occupancy, Planning. End to End = Take input, process through pipeline in a coherent way and produce low level actuator action as output. Optimize throughout entire pipeline.

 

See Podcast Site continued

bottom of page