Podcast Site 4 | start-from-scratch-n

"Not Even Wrong" Podcast
Investing in fundamentally new concepts and engineering practices with large impact.

See previous site

❤️ Episode, March 27 2024

Andrej Karpathy comments for AI

AI eventually converges to RL and closed loop self learning.
Psychology of AI different than human. AI can learn on its own terms. Move 37 at Alpha Go Zero can only happen if agent allow to learn on its own terms.
Sparcity, Precision, Quantization.
Autoregressive models and diffusion have different use cases. Why. Can they converge?
Energy Efficiency of AI must improve.
Von Neumann Architecture not well suited for AI workloads. Solutions for high I/O.

❤️ Episode, March 26 2024

The role of Art in the area of Generative AI

Art is the weights of a model, what you see is an illustration.
Modern Art (Picasso) = Theory defines what you see
Postmodern Art (Beuys) = Art evolves unpredictably with the spectator
Generative AI Art = Creating the AI is the art and what you see is an illustration
The artistic process is the design of the data pipeline, the model, reward function design, training and inference. Creativity is applied across the AI stack
Generative AI is natively non-linear. Function approximation. Art is balancing between data distributions and finding new paths, patterns and relationships between them.

Episode, March 25 2024 II

Events week 3/18-3/21

Davide Scaramuzza. Uni. Zurich. Autonomous Drones. Solving for perception and planning end to end. Event cameras for faster reaction, lower compute budget. Use prediction capabilities more effectively. Reward function design.
Sophia Shao, Berkley, BEARS 23. This slide shows important papers. Key problem is how to design heterogeneous computing systems with various accelerators, specialized memory and networking equipment to increase throughput and performance of systems? How to make them flexible and adaptive to new model and software architectures? Gemini is an open source design framework that enables flexible design of accelerators. Similar to Mark Horowitz.
Kate Saenko, Boston University.Two problems. One paper solves the problem of using LLMs for image classification. Generative video AI is trained to generate, not to classify. What’s the difference? Generation extrapolates patterns,Classification is based on some sort of nearest neighbor estimation. The other paper looks at long video reasoning. Key to bridge short segments with a context token that binds them. These context tokens are trained in unsupervised manner.
Peter Koo. Cold Spring Harbor Laboratory. Relationship between DNA and the transcriptome. Model exploration. DNN works, but simple model as good. Not sure where the information is exactly. Transformer architecture should be interesting, because similar to language, relative position of nucleotides might matter.

Episode, March 25 2024

Foundation Models for embodied decision making agents. Episode 10

Sherry Yang. Berkley.

Video as new language for embodied decision making agents. Works great for simulation because unified interface, like text and keyboard. With simulated video, RL can perform well because 1/2 Large search space can be explored and 2/2 Reward function design more efficient.

Challenges for video vs. LLMs

data coverage. Video data is out of distribution, text is not
lack of labels.
Exploring architectures. Tokenization and then diffusion like at Waabi.

Alpha Go Zero approach to self driving via Generate Video Simulation and then Sim-to-Real transfer.

A generative video model is a reasoning model because it’s reasoning about physics.

❤️ Episode, March 23 2024

Essay: Nvidia should buy Tesla

Deep Learning + Hard Engineering = Deep Engineering

At GTC Jensen said Nvidia wants to become the foundry for robotics. For that to happen they must innovate across the whole stack. Robotics is a distributed computing problem requiring innovation across the stack. Nvidia lacks hard engineering skills.

Robotics is a distributed computing problem
Both Nvidia and Tesla have holes in the stack that each company can fill for the other
Hard Engineering mindset means willingness to iterate, fail, correct errors and learn from errors rapidly. At Tesla (Space X) engineering means try and fail, fix errors, repeat. It’s not about geniuses solving hard problems. It’s about geniuses ready to fail and solve problems by learning from mistakes.

Episode, March 21 2024

Foundation Models for embodied decision making agents. Episode 9

Waabi. Foundation model for self driving. Tokenize everything based on Lidar point cloud.From the paper: We ask: what makes it difficult to learn an unsupervised world model that directly predicts future observations? Two problems. 1/2 Real world is complex, how to represent that complexity. 2/2 Token prediction has to happen for multiple tokens in real time. Solution. Mix infilling and diffusion.

The approach: tokenize each observation frame and apply discrete diffusion on each frame, and autoregressively predict the future.

The complexity of the real world makes token prediction harder than in LLMs. Not just one token has to be predicted but multiple tokens simultaneously.

Episode, March 19 2024

GTC Keynote discussion

Datacenter is the AI factory producing tokens.
When generating tokens, optimize for $/token and individual response.
Simulation. Omniverse is operating system for robotics. Omniverse is a 3D rendering and simulation engine using generative AI. Train foundation models for robots. Become the foundry for robotics. Nvidia wants to do what Dojo might be.
RLPF = Reinforcement Learning with Physics Feedback. Robot transformers, fine tune with real physics. Science is already on this path. Dieter Fox, Pieter Abbeel both are on similar path.

Episode, March 16 2024

Events 3/11-3/15

Do embodied agents need maps? Are maps emerging in agents even if they are not explicitly designed? Do embodied agents require a world model for navigation? Is learning from data enough. Humans don’t understand physics but still are able to move around. If we need maps, we also might need physics. Can end to end learning be done without those things?
What causes superconductivity to disappear at Tc in quantum materials? Measure electron pairs. Finding that electron pairs have superconducting property in one dimension but insulate in another, when doped.
Large-scale production deployments of lightwave fabrics used for both datacenter networking and machine-learning (ML) applications. Reconfigurable data center with optical networking components. Key is reconfigurability. Software defined data center. Good example of large corporates taking big bets to generate wealth.

Episode, March 15 2024

Why are we invested in Tesla? 1/3 Innovation stack around manufacturing driving Wright’s Law. 2/3 Software defined car and FSD in particular lowers cost per mile, which is main driver of adoption. 3/3 Decarbonizing energy. Social necessity and acceptance will prevail.

Episode, March 13 2024

Discussing essay "The Painted Word" by Tom Wolfe. 1/3 Observations are theory laden. We perceive art through the lens of theory. Applies to art, science and business. All of culture. Fingerprint of 20th century. Einstein "Theory decides what we observe." Similar comment by Clayton Christensen. Investment requires theory. 2/3 Risk of narrow elite deciding what and how we see. 3/3 Solution is democratic decision making = replace existing ideas with better ideas. No violence. Freedom of idea proliferation. Popper. Anti-elitist decision making. Crypto solves for Byzantine Genera’s problem. Can help solve elite problem.

Episode, March 11 2024 II

Book discussion “The Secret Life of Sunflowers” by Marta Molnar. 1/4 Great story boxed into shallow identity novel. 2/4 Art reflects society and society reflects art. This process is not well developed in the novel. 3/4 Lack of depth in characters leaves reader empty. 4/4 Identity is what you do, not what you want.

Episode, March 11 2024

Thoughts on Arxiv. Best platform for proliferation of knowledge. Fast, low cost of iteration and standardization. Best for areas such as AI where iteration and ideation are progressing rapidly. Platforms are best for areas that organically grow with them. Arxiv for AI, x.com for thoughts, facebook for images.

Episode, March 10 2024

Notes from the book "The Secret Life of Sunflowers". Design reflects Zeitgeist and Zeitgeist reflects design. Story of Van Gogh’s paintings and their struggle to be accepted depicts general evolution of ridicule , tolerance and idealization. Tesla, Apple are examples of design and functionality that transcends hierarchies and sets new tone in Zeitgeist. Interestingly with Tesla, it’s the emerging Asian middle class resonating best. Van Gogh was ridiculed at home but revered in US.

Episode, March 9 2024 II

Events 3/4-3/8

1/3 Wenbo Zhang. Battery technology.

Rechargeable Li-metal batteries have the potential to more than double the specific energy of the state-of-the-art rechargeable Li-ion batteries. Problem is during cycles a layer of SEI isolates Lithium metal and thus reduced Coulomb efficiency. To increase energy density, this process can be reversed with charge resting. Research is done to understand why and how this process works.

2/3 Towards hybrid event/image vision. Andreas Suess (Omnivision). Neuromorphic Imaging also known as Event-based Vision Sensing (EVS) uses smart pixels that create events upon changes of light intensity. This allows to save bandwidth and power for portions of the scene that remain static and dedicate it to where things are actually happening.

3/3 RL for better human performance . Park Sinchaisri, UC Berkley Haas School of Business. Inverse of RLHF, use RL to improve humans. Any interaction between AI and humans can be viewed that way. Estimate optimal trajectory from large data set and then help humans go along that trajectory. RL is particularly good in optimizing non-linear functions where adaptive decision making is necessary.

Episode, March 9 2024

Foundation models for embodied decision making agents. Part 8

1/2 Dragomir Anguelov. Overview over current research at Waymo. Key goals are to better detect objects, do motion planning, multi agent traffic scenarios and model distillation, to be able to run complex models in lower compute and memory budgets. Here are some of the key ideas Waymo is working on

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research
Motion planning using imitation learning and RL. Finding better ways to detect objects, decide whether they move and then predict their motion.
Auto-labelling infrastructure
Model distillation
Future problems to work on:

VLM architectures. Architectures optimized for video and other modalities.
Scene description formats. Reconcile current car scene description with LLM scene
description. Enable chain of thought.
Language conditioned planners. Follow human instructions
Preserve LLM/VLM capabilities with limited compute budget.

2/2 Vladen Koltun. Model free, on policy deep RL for drone racing. No world model. Policy adjusted while trying and parametrized deep RL model to approximate optimal trajectory planning. Using simulation for most of the learning. Only small part trained in real world to adjust for physics and real world conditions such as vision blurring at high speeds. During training, the policy maximizes a reward that combines progress towards the next racing gate with a perception objective that rewards keeping the next gate in the field of view of the camera.

Episode, March 7 2024

Rivian is pivoting from truck to crossover/ compact car. Competing for share in the software defined car. In order to scale and build a profitable cash flow generating business, Rivian management must build innovation stacks along at least four vectors. 1/4 Low cost, reduced capital intensity and high performing battery. 2/4 Integrated manufacturing, in particular design and engineering focused on cost and value (Think of Henry Ford, who said a car that lasts longer than the engine is waste). 3/4 Operating system for software defined car. 4/4 Self driving robot software, training and hardware infrastructure. Highest probability of success would be if company recaps, sheds legacy R1 business and replaces management with people who are willing and capable of building necessary innovation stacks.

Episode, March 6 2024 II

Discussing lecture by Michael Jordan, Berkley. How AI can help advance microeconomics. While the current era of machine learning is focused on recognizing patterns, the future will be all about making decisions.

Discussing lecture by Park Sinchaisri. Using Reinforcement learning to improve human decision making. RLMF (Reinforcement Learning with Machine Feedback), inverse of RLHF. Both lectures discuss possibilities of using large data sets and RL to improve human interactions.

Episode, March 6 2024

Discussing lecture on shadow mode for A/B testing. David Tagliamonti from Ambient.ai. A/B testing real-time event detection systems, particularly in mission-critical environments requires unique approach. Shadow mode testing is one variant. Shadow mode for event detection. Data collection. Model testing. Similar process applied at Tesla.

Episode, March 5 2024 II

Discussing paper on approximate attention for increased context window in LLMs. 1/4 Problem is increased compute and memory I/O budget for large context window. 2/4 Solution is approximation. 3/4 Gage entropy and/or information content in query and quantize, ie only focus on relevant cells of attention matrix. 4/4 This even works with high entropy prompts since then you can sample stochastically and avoid large compute and memory I/O as well.

Episode March 5 2024

Reaction to discussion about alignment of AI models,RLHF, DPO and other methods to better match AI with human preferences. 1/5 AI models will only do what they’re told to do. For that we need to know what we want. 2/5 Two vectors, one is, how to align systems once we know what we want and the other is, what is it we actually want. 3/5 Humans have preferences and finding them is an interesting problem. RLHF is very useful, DPO can solve the problem more efficiently by systematically reducing search space with adequate pairwise contrasting. 4/5 Appeal to Silicon Valley: AI totalitarianism is sum of all fears. Avoid feeding this monster. 5/5 Solution is to find ways to natively incorporate error correction and checks and balances. Accountably, paying price for screwing up. Design system that rewards alignment with diverse set of views. Diversity weighted by ethical, moral and personal values.

Episode March 3 2024

Week 2/26-3/1 Part 3

1/3 Andrea Morello, Use silicon based technology to crate spin based quantum information processing chips. 2/3 Carl Cave, Quantum Measurements. A new perspective. Treat measurement independent of state. It’s like the dynamic process itself. 3/3 Yanchao Bi, Perception and language and how it relates to knowledge in the human brain. Where does low level perception form into more abstract higher level representation. What is higher level representation. Where in the brain does it happen and how?

Episode March 2 2024 II

Week 2/26-3/1 Part 2

1/3 Thierry Tambe, New frontiers in chip design for AI. Quantization, precision adaption to workloads. eDRAM as memory on chip for transient data. Sentence level granularity and voltage/frequency adjustment. Look at the entropy of the workload and adjust energy consumption accordingly. Implicit refresh in DRAM.

2/3 Renee Zhang. Magnetically actuated origami robots that can crawl and swim for effective locomotion and targeted drug delivery in severely confined spaces and aqueous environments.

3/3 Emily Alsentzer, Create datasets for few shot learning in rare disease statistics. Data augmentation using real world examples and then generating more data based on that. Similar to paper on using generated video for robot learning.

Episode March 2 2024

Week of 2/26-3/1 Part 1

Foundation models for embodied decision making agents. Part 7

1/3 Quan Vong, Robot scaling. Journey from RT-1 to RT-X. Create Internet of data for robot learning. Unify data structures and generalize models across modalities. RT-1 text, RT-2 Text, video, RT-X all kinds of modalities. Use foundation models as base. Q-transformer. Use pertained model to compute q-value in RL. How can robotics take advantage of internet like text? By creating its own internet of robot data.

2/3 Dorsa Sadigh, Robot scaling. Learning from humans. How to represent data so it can be used at scale for transformer architectures. Visual-text transformers for robots. Pre-train then fine tune. Fine tune with pairwise preference models requires good understanding of efficient question/answer tuples. Bootstrap robotics by deploying a fleet and collecting data for transformer training. 3/3 David Blei. Variational Inference (VI). Fundamental statistical concept underlying modern generative AI. Pre-trained models are essentially massive matrices of distributions of groups of tokens. At inference time, prompts and/or observations get matched to the most likely distribution they belong to. VI is a statistical tool to efficiently find distributions for data inputs given a certain amount of information. That’s why it’s called Bayesian inference.

❤️ Episode February 29. 2024 II

Foundation models for embodied decision making agents. Part 6. Paper discussion:

"Video as the New Language for Real-World Decision Making." Elevate video generation

models to the level of autonomous agents, simulation environments, and

computational engines similar to language models so that applications requiring visual

modalities such as robotics, self-driving, and science can more directly benefit from

internet visual knowledge and pre-trained video models? In order to build foundation

models for robots, data is required. Video generation is one path towards that goal. The

other is a fleet of robots, whose actions can be turned into a dataset for a physical

transformer.

Episode February 29. 2024

Book discussion “The House across the Lake” by Riley Sager. 1/5 Not worth the money and time. 2/5 Shallow characters, inconsistent story and messy emotional trajectories. 3/5 Vanity Fest. 4/5 Feels like watching a B-movie with bad booze. 5/5 Good depiction of addiction.

Episode February 27. 2024

Corporate activism is ok but it has to be transparent and align with mission. Google is not. Nor are many others such as Disney, Oil, Tobacco and most food companies. Investors must scrutinize and guard against creeping activism.

Episode February 26. 2024

Discussing paper on hand-object planning and video by Andrej Karpathy on

tokenization. Both projects emphasize the importance of representations. Central

doctrine of AI; data - architecture - model - algorithm doesn’t apply in linear fashion. It

goes back and forth. In particular data representation plays a crucial role. Multi modal transformers will work for robotics but representations and/or tokenization must be

carefully designed and engineered.

Episode February 24. 2024

Nvidia and Rivian earnings calls. 1/4 Nvidia cements leadership in software defined hardware for AI workloads. 2/4 How can Nvidia do this? Jensen leadership takes notes form Tesla playbook. Scale by launching products as R&D like CUDA, Omniverse (platform for video processing). 3/4 Risk to Nvidia is ARM. Low end, edge inference can be disruptive. 4/4 "Rivian not viable stand alone," says Wall Street. This applies to most other competitors. Tesla and BYD and maybe third competitor will join them as sole suppliers of transport hardware.

Episode February 21. 2024

Book discussion “Hello Beautiful” by Ann Napolitano. 1/4 Love is not what you think. Darwinistic feeling that serves an evolutionary purpose like fear or hope. 2/4 Love is an imaginary state of mind; selfish and devoid of empathy. 3/4 Life, career and love happen to people. Agency is navigating this process. 4/4 Literary technique; show success in characters raises their importance and amplifies their emotions and relationships.

Episode February 19. 2024

Foundation models for embodied decision making agents. Part 5. Discussing two

papers about generative AI and its power to generate data sets. Additional comments

on Sora to illustrate the potential of generative AI for simulation in robot learning. 1/3

Solving math problems. Generate

problem/solution tuples with code and then use data set to train model for math problem solving. 2/2 Extreme video data compression by using snippets of original video as prompt to generate rest of video. Key here is a

learned decision process which strategically excludes large chunks of the original

video and generates video sequences with generative AI. That way you can compress

original videos with orders of magnitude more compression than is possible with existing compression techniques. Take original, compress and then in reconstruction use only parts of the compressed original to generate video sequences with diffusion. 3/3 Nathan Lambert on Sora: Use video generation to create data set for robot training.

Episode February 16. 2024

Foundation models for embodied decision making agents. Part 4. Google

introduces Gemini 1.5. Multimodal reasoning and long context window. Research

questions on multimodality: 1/2 How to concatenate different modalities? One large

embedding or delegation of tasks. 2/2 Mixture of Experts architecture is key to

multimodality and long context windows. Long context window enables one shot

reasoning on complex prompts. Gemini 1.5 can reason from text, video, image. Some

examples of Gemini multimodality. 1/3 Show graphs and let model generate

corresponding code. 2/3 Show football player kicking ball and let model suggest

improvements. This can be used for automated coaching. 3/3 Show dish and generate

recipe.

Episode February 12. 2024

Discussing “ai.economics”, an essay by Ben Spector. Foundation models are like chip industry. Large investment, huge value, rapid depreciation. Tesla can become the X-86 of multimodal foundation models. Application to FSD and robots and other embodied decision making agents. That’s what Musk means when saying Dojo low probability , high return.

Episode February 11. 2024

Book discussion. “Everyone here is Lying” by Shari Lapena. 1/3 Kids’s innocence is a bug not a feature. Society obsessed with kids reflects insecurity of adults. Humans come with agency and accountably. 2/3 The lie is a character in itself impersonated by different people. Lying more difficult than you think. In fact, lying is an oxymoron. There is something akin to the conservation of truth with lying. 3/3 What does it take to write a good novel? A good story. Good stories are built on characters with deep relationships.

Episode February 9. 2024

Event discussion. 1/2 Fusion reaction experiment shows for first time that fusion energy could work in principle. 2/2 Flash attention reduces wall clock time of transformer learning and inference. Ideas with impact. Fusion is illusion because high iteration cost, high error correction cost and long lead times. Flash attention is a great example of how first principles thinking helps solve problems. Find ways to reduce read and write in high bandwidth memory in GPUs with mathematical adjustments.

Episode February 7. 2024 II

Event discussion. Synfuel Symposium at PSI (Paul Scherrer Institute). Methane - Methanol - Olefins - Hydrocarbons. Chemical engineering applied to synthetic fuel production. How to design reactors, what kind of materials? Physics parameters such as temperature, viscosity, pressure, CO2 content etc. matter. Catalyst design. Vladimir Paunovic very interesting presentation. Using various forms of spectroscopy to measure operando drift. Design catalysts for synfuel value chain. Computational methods could be key here. More in silico design would increase rate of iteration and deliver faster results at lower cost. Vladimir’s phd thesis about how to design better catalysts for hydrocarbon production out of natural gas.

Episode February 7. 2024

In silico design, or computational design, refers to using computers and software to design materials, products, molecules, and even processes entirely within the virtual world. Jensen Huang at JPM 24 talks about how chip design is almost exclusively in silico. The same is coming to pharmaceuticals. Apple, Tesla sell software defined products. Tesla is software defined car. Software makes the car, manages the car and drives the car. Faster iteration, better safety, more convenience and lower cost. Moore’s Law driving car and car design. Production and design of Next Gen predominantly in silico.

Episode February 4. 2024

Discussing events of the week. 1/2 GNNs and physics. Taking advantage of symmetries in computer vision. Stuff is when symmetries break. CNNs to transformers progression of models based on similar principles. 2/2 Multimodal text and video masked model. Words have strength and thus provide compact information about scenes. Take advantage of intricacies of text in multimodal. Sum is more than parts.

Episode February 2. 2024

Foundation models for embodied decision making agents. Part 3. Discussing Neural Radiance Fields (Nerfs), Occupancy networks and Transformers.History of computer vision. Supervised learning after image net. Convolutional Neural Nets exploit symmetries. Nerfs manage depth, perspective and precise rendering. Occupancy Networks deliver light weight object detection. Transformers are Nerfs on steroids. Nerfs are good representation through compression. Low compute budget on inference but high training cost. Transformers offer more precision and general inference but high compute budget. Trade off between training budget and inference budget.

Episode February 1. 2024 II

Concluding remarks Neurips 2023. Prepare. Observation is theory laden. The better you’re prepared, the more you get out of it. Creativity is when you define your own problems and intelligence is when you’re able to solve them. Foundation models for embodied decision making agents will propel robotics. Learning from video with reward function design. Generate video through diffusion and teach robots based on observed skills. Experience through offline RL. What does learning actually mean? Memory versus credit. What do I know and which knowledge leads to higher rewards? Inference based on maximum likelihood methods (which distribution might the observed data belong to?) versus score matching (Markov process with sequential decision making). At the limit both methods converge.

Episode February 1. 2024

Neurips Part 9 1/5 Learning to Influence Human Behavior with Offline Reinforcement Learning. In a collaborative game you take the other agent’s latent strategy into account when updating your own policy. You learn how to influence the other’s actions to optimize for you given task. 2/5 COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning. This is an attempt to teach a robot how to discover its own problems and solve them. Gather experience through training on offline RL data. 3/5 HIQL: Offline Goal-Conditioned RL with Latent States as Actions Extract value function from observed states and thus create a latent representation of subgoals. Use those subgoals to train a policy to actually reach the goal. Grounded Decoding: Guiding Text Generation with Grounded Models for 4/5 Embodied Agents Use LLMs for high level training and grounding the robot in affordances (what can the robot actually do). Both generate a combined text that is semantically and physically sound. 5/5 Learning Universal Policies via Text-Guided Video Generation. Use youtube videos to train a video diffusion model. Then text prompt instructions to generate a video with specific tasks (kick the ball, arrange blocks by color etc…). Use inverse modeling to extract kinematic and other actuator information and use model based RL to train the robot.

Episode January 31. 2024 II

Foundation models for embodied decision making agents Part 2. 1/3 FSD shadow mode. What is shadow mode, why is it so important for the engineering of FSD. 2/3 End to End training of FSD means single optimization pipeline. Allows for more expression. Creativity. Robot finds its own problems to solve. FSD problem statement is “drive as close to human like as possible. 3/3 Most scarce resource in Tesla AI is compute. That’s a good thing. That means that Moore’s Law is working for FSD.

Episode January 31. 2024

Discussing Tesla Q4 Earnings and thoughts. 1/3 Volume and pricing stabilizing, Margins higher and EPS growth. 2/3 FSD not at inflection point in terms of adoption but all ingredients are in place for FSD to become more accretive to earnings. 3/3 Corporate. We believe Musk should step down as CEO and remain Chief Architect. Focus on exciting projects such as next gen. production technology, FSD, Optimus. Day to day better ran by dedicated people with skin in the game.

Episode January 30. 2024

Discussing “Inventor of the Future” a Buckminster Fuller Biography Spaceship earth. Deutsch meets Jobs, meets Musk, meets Feynman. “Nature makes a huge effort to help us succeed. But nature doesn’t need us. We are not the only experiment.” Structure determines nature and function. Use isolated components and connect them in space so the whole is more stable, robust, flexible and stronger. 3D chemistry. Difference between carbon nanotube and carbon atom. How to scale a large company. Build synergetic pieces that fit together. (Cybertruck helps single cast tech and enables next gen single casting). Silicon Valley mindset of problem solving must be accompanied by problem solving in society, environment and politics.

Episode January 28. 2024

Problem is at the core of knowledge and creation. “A problem well stated offers its own solution”. What is a good problem? What is a well stated problem? Problems must be discovered like everything else. Knowledge starts with problems and wealth is created through the discovery and solution of problems. Better statement for Tesla FSD problem: “Self-driving that feels human.”

Episode January 26. 2024

Neurips Part 8 1/4 DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models. Diffusion + Physics. Generate simulation for robots that incorporates geometry and physics. Use Monte Carlo sampling techniques with differentiable physics to replicate real world behavior and train robots and differing tasks while taking physics behavior into account. 2/4 MULTIREACT: MULTIMODAL TOOLS AUGMENTED REASONING-ACTING TRACES FOR EMBODIED AGENT PLANNING Imagine Tesla Optimus seeing a video of clothes folding. The video gets spliced up and each spliced frame gets a language captioning. Now the robot learns how to fold based on this dataset and uses the captioning as measurement for reward function. The text helps evaluate whether the robot is on the right path. 3/4 How to Prompt Your Robot: A PromptBook for Manipulation Skills with Code as Policies. This paper attempts to make LLMs more useful in longterm robotic planning. The key problem here is context length. You can’t tell a robot, please clean my kitchen. But it’s not clear to the robot what that means, in fact, even a human might misunderstand. But if you compartmentalize the problem and use step by step instructions and add information about intermediate states, you can get better results. Think of this like a job description for a robot using LLMs. 4/4 Exploitation-Guided Exploration for Semantic Embodied Navigation. Knowing that there are specialists gives you more confidence it tackling more specific problems. Let the agent explore until it gets close to a landmark, then let it exploit. The separation of explore and exploit should in principle improve exploration and yield bette results.

Episode January 25. 2024

Tesla Q4 2023 Earnings Call discussion: Three drivers of the stock 1/3 FSD monetization. 2/3 Lower cost per mile driving demand 3/3 Robot that builds the robot. Positive: FSD, Next Gen, Energy, No more pushing price to drive volume. Margin stabilization. Interest rates tailwind. Optimus about 18-24 months from first deliveries. Negative: 3/Y demand growth less than expected. China demand/supply imbalance. Lack of clear margin guidance. CFO must improve communication skills, not as smooth as Zach. What is purpose of Tesla chip manufacturing Dojo inference? Not clear.

Episode January 23. 2024

Tesla (SOP) sum of the parts. 420 $. Valuation based on four businesses, car 50%, FSD 35%, Energy 5% and Robot 10%. Market values car business. Nothing for rest. FSD value creation is fundamentally the lowering of cost per mile. Will come down because of lower cost and more miles driven. Robot replaces high industrial labor cost and opens new avenues for scale manufacturing. Ford didn't invent car, he invented assembly line. Musk will go down as not having invented EV, but scale manufacturing of highly complex robots that are built by robots. In principle, Tesla's value depends on how fast they can iterate and improve.

Episode January 22. 2024 II

Discussing Deep Mind's paper: "Solving olympiad geometry without human demonstrations". Iteration and recursive update on policy. Key idea is to generate random math transitions and then recursively cross check the insight from those transitions against the actual problem. If you iterate enough and generate an adequate update rule, you will eventually find a proof. This paper has a similar flavor like FunSearch. Generate random samples in function space and then recursively search for interesting semantics until you find the solution. Intelligence = Iteration and recursive policy update at scale and speed.

Episode January 22. 2024

Neurips Part 7

1/4 Video Prediction Models as Rewards for Reinforcement Learning. Use video data to create a generative video model that generates video. Then compute rewards based on action and transition in video. Thus, build a dataset of action and reward tuples that can be used for offline RL. 2/4 Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration. Our main idea is to prevent the distribution of high-value states from affecting exploration around low-value states, and vice versa, by filtering out states whose value estimates significantly differ from each other for computing the intrinsic bonus. Use states space similarity as intrinsic reward, make sure the agent explores different states. Introduce value associated with states. Make sure, that high value deltas get eliminated. Let the agent explore in the rest of the state, value tuples. 3/4 Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? Can a model be built that generalizes vision to action. This paper studies various models and concludes that so far, there isn’t one. Tesla is working on some model, they call it World Model. Data-Model-Architecture-Hardware all must interplay. 4/4 MAViL: Masked Audio-Video Learners. Idea is to predict raw inputs in heterogeneous modalities, the model with masked-view inputs jointly predicts contextualized audio-video representations in the homogeneous (aligned) latent space.

Episode January 20. 2024

Paper discussion:

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities Pipeline for robot end to end training. Foundation models offer general pre-training with potential applications to diverse robotics domains. Simulation. Nerfs solve for depth. Masked Auto-Encoders for World model. The world model is defined as predicting the future states conditioned on action and past observations. Pipeline = Track, Map, Motion, Occupancy, Planning. End to End = Take input, process through pipeline in a coherent way and produce low level actuator action as output. Optimize throughout entire pipeline.

See Podcast Site continued

"Not Even Wrong" Podcast Investing in fundamentally new concepts and engineering practices with large impact.

"Not Even Wrong" Podcast
Investing in fundamentally new concepts and engineering practices with large impact.