Podcast Site 3 | start-from-scratch-n

"Not Even Wrong" Podcast
Investing in fundamentally new concepts and engineering practices with large impact.

Previous site

Episode, February 6 2025

EUV lithography from the guys who built it.

Two ASML engineers talk about building the most important chip technology of today. In particular, optical patterning and formation at nanometer scale. Key insight is how a group of dedicated managers with support from visionary leaders can build extraordinary things. Musk, Bezos, ASML, Netflix et. all. Large engineering projects take time but when they hit markets they quickly become the new reality.

Episode, February 4 2025

Speculative decoding and key ideas behind reasoning models. Cousins. Both methods use compute resources at inference time to better approximate path to solutions. Speculative decoding generates tokes in parallel and then chooses. Reasoning generates Chain of Thought branches .The key in reasoning models is to automate CoT and select best candidates for given task. One key idea of reasoning models is to create problem solving templates. Reasoning models get more inference out of compressed knowledge through iteration and through learning how to match tasks with problem solving templates.

Episode, February 3 2025

Biologically Inspired Algorithm and Hardware Co-Design for Efficient Machine Intelligence. Priya Panda (Yale University)

Define objects through temporal dynamics and neural spikes. Paper describes the concept. For example a cat is represented by the specific temporal dynamics of data input. Not the relationship between pixels. Overcome memory wall. Coordinate algorithms and hardware design. Robotics is killer app for neuromoprhic compute because time dimension very important in real world AI.

Episode, February 1 2025

Robot manipulation via geometric learning. Anthony Simeonov PhD defense.

1/3Geometric learned relationship between objects and grasping hand. 2/3Diffusion for pose prediction in interactive ways. 3/3RL in Sim using Isaac gym and other parallelisms. First behavior closing then RL. One idea from this paper: Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation. RL in Sim using RGB images, not point clouds. More precision. Inspiration from paper about toolfinder. Machines can teach themselves how to find the optimal tool for a task. Can be used in robotics. Future factory equipped with robots that operate like tool finder, they move around and do stuff and call API’s for specific tasks. Reasoning and Acting from ReAct paper.

Episode, January 31 2025

Reaction to Podcast interview “Capitalisn't” with Eugene Fama. 1/2 Bitcoin is the experiment that either validates monetary theory or falsifies it. Hard, rule based money versus fiat money system. “If Bitcoin succeeds, we’ll have to rethink our monetary theory”. Science is about experiments and better ideas. 2/2 Bubbles are only bubbles if you can predict their demise. Otherwise it’s worthless to talk about bubbles. In order to talk about bubbles, you need a theory which predicts them bursting. Fama is one of those rare individuals that continues to inspire.

Episode, January 30 2025 II

Book discussion “The mysterious case of Rudolf Diesel” by Douglas Brunt. Prolific engineer with large impact. Disruptor. Epitome of a time of invention, real economic growth and rapid progress. Reminiscent of Elon Musk and our times. Fascinating story about Diesel’s disappearance. Most likely defected to Britain prior to WWI. Large impact on submarine technology.

Episode, January 30 2025

Tesla Q4 Earnings call. UCLA Deep Learning for Automated Chip Design. Tesla call. 1/3 Inspiring AI outlook. Car business lackluster, energy very good. 2/3 Unsupervised FSD announced, Optimus. China competition for FSD and Optimus? 3/3 Tesla using autoregressive transformers. Parallelism, large data pipeline. “Scaling across various axis like data, model, pipeline. Applying semantics, compositional reasoning. Use VLM to extract semantics. Long context window, memory constraints. (HW3 memory cannot deal with length of context window). UCLA Jason Cong presentation about Deep Learning for automated Chip design. “Problem is challenging as the deep learning agent must come up with an optimized circuit microarchitecture, and raining examples are limited.” Cong is leading an effort in Domain specific computing, which aims to deliver more efficiency beyond parallelism. Imagine Software engineers designing their own chips based on specific applications. Inverse chip design: Define problem, generate algorithm and then design appropriate chips.

Sparse reward problem.

Episode, January 29 2025

Neurips 2024 Part 15. ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs, Trevor Darrel

VLMs are not good at compositional reasoning (CR). VLMs are good at segmentation. But do they really understand the relationships? “The book is on the table, the book is smaller than the TV etc..”. Current VLMs show good performance against CR benchmarks. But that’s because the benchmarks are weak. They are rule based, not learned. This paper offers a solution. Iterative conversation between VLMs and LLMv’s to create a dataset and benchmark that is more robust. Eval matters. CR matters for real world robotics.

Episode, January 28 2025

Results for 2024 and comments on Deepseek’s r1. We published our 2024 report. Deepseek launches inference model that surprises tech community because it allegedly used less resources than competing models. History of AI, symbolic logic, Image net (classification), LLM (compressed knowledge generation). Inference is similar to LLMs, just instead of storing knowledge, problem solutions learned and stored as templates in latent space. Automated Chain of Thought engines. Like any other AI, training inference models is driven by parallelism at scale. What is intelligence? Hofstadter (Gödel, Escher, Bach). Sutskever talk at Neurips 2024, we’re done with training, now focus on inference. Nvidia’s future is real world AI. Physics simulations and inverse problems. In vitro engineering design and robotics.

Episode, January 26 2025

Automated Planning Domain Inference for Task and Motion Planning, Shkurti et all. From paper: Learn planning domain and use tree search for more efficient inference. Planning Domain = Set of (Objects, predicates [on top; open; etc.], actions and motions). Innovations:

Full planning domain, not just actions
Integrate environment, objects etc. through graph neural network
Search, transfer learning. Not necessary to search from scratch.

Future work:

Lookahead planning. How to better search actions space at inference time. How to improve long horizon planning? Sampling + constraints. For example, when sampling actions you put constraint that fridge door must be open.

Episode, January 22 2025

Trump inauguration pivotal event. America is best when it’s about business. Other regions of the world find inspiration in history, faith etc. America works best when it’s sole purpose is to grow and create wealth. Trump is doing exactly that. Three take aways from his speech. 1/3 Reversal of derailment under Biden (which started earlier with Bush, Obama etc.). 2/3 Back to business. Business = real economic growth driven by technological disruption. What’s possible (physics) turned into what can be done today (technology) and commercialized (entrepreneurs). 3/3 Tump’s and Musk in sink about how the US should be run and where it should be going. Reduce deficit, drive growth and hire best people.

Episode, January 16 2025

Neurips 2024 Part 14. 14. Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models. UC Berkley, Distill a problem into its principal components. For example, a specific task can be distilled into a quadratic equation. Meta problem identification, buffering and solver. What’s the appropriate chain of thought? Explore with Tree of Thought (explore different prompts), Chain of Thought (different prompt yield different results with fixed weights). Find structure in problem. Train foundation model for inference. Create a meta buffer.A pre-trained LLM is a compressed version of knowledge. A pre-trained inference engine is a compressed version of problem solving approaches.

Episode, January 15 2025

Reaction to Marc Andreessen in Hoover interview. 1/3 China’s strong economy is not a threat, it’s the solution. 2/3 Technology optimism is a moral obligation because it fosters real economic growth which is the only thing that matters for improving the human condition. 3/3 DOGE, government spending must be reigned in. The onus is on congress to prove that spending is constitutional and not the other way around. It can be done. There are legal ways, economic arguments and political hacks to get this done.

Episode, January 12 2025

Neurips 2024 Part 13. DreamDrive: Generative 4D Scene Modeling from Street View Images

Take 2D images from ego cameras and synthesize 4D generations. Self Driving simulation and execution. Take 2D image. Generate priors with diffusions. Estimate 3D representation through Gaussian splatting. In particular learn the depth parameters for 3D. Use static versus dynamic decomposition for 4D. Gaussians that move together belong to the same dynamic object. Diffusion + Gaussian Splatting + Dynamic decomposition.

Episode, January 10 2025

Neurips 2024 Part 12. Memorize What Matters: Emergent Scene Decomposition from Multitraverse, Marco Pavone. Self-supervised and camera-only 3D mapping approach, reducing the need for human annotations and LiDAR to enable selective retention. Goal differentiate between state moving images when driving down a road. Camera based image as input. Gaussian splatting to segment pixels into learnable parameters.Unsupervised learning what’s static and what isn’t. Further work: Learn what matters, pose estimation, semantic understanding of environment (flag is moving, but static. Dog might be facing away from street, construction worker waving).

Episode, January 9 2025

Neurips 2024 Part 11. Vision Foundation Models for pose estimation. Pieter Abbeel.

Goal is to generalize model, such that if object is seen that wasn’t present during training, model still predicts pose estimation. Pose estimation is related to semantics discussed in Part 9. (“The glass is upside down”). First match. Point cloud transformer to identify object and match with similar categories. 2. Related image to identified category and apply NOCS (normalized coordinate system). Use NOCS to predict pose. 3.Synthetic data do fine tune the foundation model.

Episode, January 8 2025 II

Book discussion “The Noise of Time” by Julian Barnes. 1/3 Historical novel about how it felt to live in the Soviet Union from the perspective of a composer. Gulag Archipelago, Solzhenitsyn. Society derails because truth is fabricated. His life felt like he was always off on the metronome. (Lots of allusions to music and life in the book). “When you chop wood, chips fly. But what if you lay down the axe and all you’ve done is reduce the timber to chips.” “We thought we are in charge of liberty. Turns out, liberty is in charge of us.” 2/3 Value of integrity. You loose integrity - you die. “Integrity is like virginity. Once lost, you can't recoup it.” Natural progression of human life is from optimism to pessimism. Irony creeps in to deal with the lack of clarity and truth. Irony keeps you human.” “Conscience lost it’s evolutionary value in Soviet Union. But did it ever have one?” 3/3 Art and music.”The only music that matters is the inner music. Some people turn it into real music. And if it is strong enough and withstands the noise of time, it eventually turns into the whisper of history. “

Episode, January 8 2025

Lessons from Peter Thiel - most value creating investor. Based on Zero to One. “All happy companies are happy in their unique way. All unhappy companies are unhappy in the same way.” The latter are caught in incrementalism and lack of differentiation (Uber, melted cheese store, banks). Competition destroys returns. Not fittest but niche. However, wealth requires Zero to One and scaling from one to N. Thiel should write a sequel for one to N. What is Zero to One? It’s the execution of a secret with relentless conviction and Columbus style journey into the unknown. Difference between mystery (unachievable) and secret (achievable). Examples of the past: Walt Disney drawing cartoons on film. Steve Jobs creating app store. Examples today: Cybercab, real world AI enabling new industrial value chains.

Episode, January 7 2025

George Soros - the master of finance JiuJitsu

Lean on events that happen anyway. Don’t force it - go with the forces. For example, when the British pound was pegged to the D-Mark, the Brits couldn’t sustain the forces. Macro investing is top down (Soros) or bottom up (Druckenmiller). Lean on strong global economic tectonic movements. Since politicians typically fight against it, timing is of essence. Examples of current global forces - EU and Euro break up. Chinese Renminbi weakening due to debt crisis. US budget deficit not sustainable - Bitcoin demand surging.

Episode, January 5 2025 II

Neurips 2024 Part 10. Physically Compatible 3D Object Modeling from a Single Image, Kaiming He.

Computational framework transforms single images into 3D physical objects.Three orthogonal attributes: mechanical properties, external forces, and rest-shape geometry. Reconstruct physics from images. Fundamentally, an image is more than a visual representation of an object: It captures a physical snapshot of the object in a state of static equilibrium, under the influence of real-world forces. Two ways to define internal physics of a material. Use priors or learn (Sutton). How about when materials interact ie. when one material is not just exposed to gravity but to other forces of materials that themselves are constrained with internal forces. Technique: Take image of object. Dissect it into small tetrahedral. Define transformation matrices between edges and vertices of tetrahedral subsections.

Episode, January 5 2025

Book Discussion “For Whom the Bell Tolls”, Earnest Hemingway.

Tesla Q4 deliveries below exceptions due to Osborne effect (Model Y).

Book discussion. 1/5 Existentialism. What you do is who you are. 2/5 Life is fight and love. Not Darwinian but human. Life happens when those things happen. Madrid (the outside world) doesn’t have that. Parallel to La Peste and Oran under quarantine. 3/5 Good and Evil not clear. Does it even matter? Yes. You know it when you see it. 4/5 Why do Fascism and Communism rise in the 1930s? Nietzsche and/or Prometheus. 5/5 Answer is liberty, knowledge. But how to preserve liberty? Gödel self referential paradox. With inquiry and Popperian dialogue.

Episode, January 3 2025

Politics. Why Elon Musk is important for Democracy.

Democracy is a system where power changes without violence. It’s predicated on freedom of speech, freedom of thought and freedom of voluntary transaction. Musk is neither a threat to democracy nor does he impede freedom. To the contrary, he is pushing for more of it. 1/3 Free speech. Musk has spend billions defending the right to free speech. 2/3 DOGE. Eliminating the fourth branch of government -unelected bureaucrats making their own laws. 3/3 Entrepreneurs are harbingers of freedom. Wealth and Freedom are two sides of the same coin.

Episode, January 2 2025

Neurips 2024 Part 9. Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making, Percy Liang

Foundation Models for Decision Making Agents Part 24. What does it mean to do something? Tasks differ. LLMs can help through semantic knowledge, but they aren’t trained for physics. There is a knowledge gap. Should we use LLMs as priors or train end to end. (Bitter Lesson!).

Sparse rewards, i.e. the path from state A to final state is a curvy path with lots of subgoals. Standardize some of those curvy paths. Define task, either through state (open fridge) or through action (go to door, pull etc..). LLMs miss out on trajectories. For example “empty the fridge of all outdated items (State oriented) or take everything out that is outdated (action oriented)”. Formal language is LTL. (Linear Temporal Logic). Connects states with actions and has temporal instructions such as “next” or “when done”.

Episode, January 1 2025 II

Favorite books read in 2024

1/5 “The Painted Word" by Tom Wolfe. Observation is theory laden. 2/5 “Girl in a Band” by Kim Gordon. Zero to one in the New York 70s art scene. 3/5 “Doctor Zhivago”, by Boris Pasternak. Revolutions typically replace on Leviathan with another. 4/5 “The Whole Story” by John Mackey. Zero to one in real life. 5/5 “East of Eden” by John Steinbeck. Good and evil exist. And money flows to the good.

Episode, January 1 2025

What works and what doesn’t for 2025

Works: 1/3 Trump, DOGE, less regulation. 2/3 AI and space. 3/3 Bitcoin

Doesn’t Work:

1/3 High Debt and inflation in healthcare, housing and education. 2/3 AI displacing jobs at unprecedented speed. 3/3 US dollar looses global reserve currency status

Episode, December 29 2024

Neurips 2024 Part 8. Smoothie: Label Free Language Model Routing.

Chris Re. Which LLM is optimal for which task. An LLM-powered chatbot may be asked to write code, answer questions about different domains, summarize documents, perform extraction, and more. Train LLM router with unsupervised learning. Statistical similarities, nearest neighbors and gaussian approximation for LLM quality estimation. Also, develop prompt discriminators. Model chooses optimal prompts for problem solving. Self assembling optimal chain of thought. There are thousands of LLMs and SLMs to choose from.

Episode, December 26 2024

Three forces of wealth creation in the coming years. 1/3 Technology. AI and Space. What matters is innovation. Driving value. 2/3 DOGE and Trump. Reduce waste in US government. Positive spillover to economy. 3/3 US dollar loosing reserve currency status. Forcing everybody to either save Dollar or transition to Bitcoin. Interesting interview with Stan Druckenmiller. Wealth creation is about sizing when you’re right.

Episode, December 24 2024

Neurips 2024 Part 7

LLMs for smell. “You give people a substance and they label how its smells. Then you give an AI a chemical structure corresponding to the substance and the AI predicts the substance and/or the smell.“Binary evolution is easier than continuous. Limitations, intensity or concentration and available datasets. Mapping airborne odorants with AI. Cybercab smells. Labelling chemical structures. Mapping with human perception. Data representations, how can chemical structures be represented with vectors for machine learning?

Episode, December 20 2024

Neurips 2024 Part 6

LLMs for math and how to grok.

Math depends on the problem. Calculate, Proof. Describe. Physics for example requires math to describe. Building AI for math depends on what you need. Problem (regression, addition etc. ), Task (different types of regression) and data (different data). Key question of the paper is when does a model transition from memorizing to generalization. What is generalization? It means skill composition. How? Train model with lots of data and diverse tasks and problems. The more diverse, the better generalization. Attention-head design determines mapping ability. MLP (multi layer perceptron) determines ability to calculate (sequential task). At the limit every model can be thought of as memorizing if you train enough data and tasks and problems. Sample efficiency. Tokenization matters. How to tokenize numbers?

Episode, December 18 2024

Neurips 2024 Part 5. Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control, Dhruv Batra. Vision language models are useful for robotics. Why? Because language is a pre-trained prior that helps with semantic understanding. This paper discusses using diffusion models for better segmentation of scene. Fine grained control.

Design space for diffusion models: latent concatenation. Denoising time steps. Connect diffusion model with text. Design choice depends on task. For example, if you task is to find a hotel in Shanghai, rough segmentation is enough. If you task is to find excellent pizza place, you need more granular segmentation. Diffusion lends itself to more fine grained segmentation because the de-noising process is a gradual uncovering of ground truth.

Episode, December 17 2024

Human is the only animal with the obligation to create their own narrative. From J. Steinbeck’s “East of Eden”. You have a choice. You have an obligation to design your own narrative. Zero to One. Wealth is only created with Zero to one. Life is Zero to one. Problem is, it’s a high rope dance between good and evil, between purity and sin. Both extremes are dangerous. Find the path in-between. There is no such thing as purity, neither in morals, nor math (Gödel) nor in science (Popper). Solutions to problems (Popper). The measure of success in life is love (Sam Hamilton) . Wall Street likes to make their own narratives and often fails to adapt.

Episode, December 16 2024 II

Three presentations that support arguments for Omniverse. Hinton, Xiao, Sutskever. Geoff Hinton says that intelligence can be learned. Before, AI community coalesced around the statement: “The essence of intelligence is reasoning.”Hinton says: “Wrong. The essence of intelligence is learning.”The same applies to physics. Ted Xiao mentions three fundamental areas where robotic foundation models can be improved. Scaling, Context (actuators), Evaluation. Ilia Sutskever pre-training will end. No, it wont because of physics. But there is a very important thing that he says in this presentation, which is: “We dared to believe that this will work.” Same applies to Jensen and physics.

Episode, December 16 2024

Broadcom versus Nvidia. It’s the Software stupid! AI XPU, Semiconductor technology for AI workloads. Focused, predictable workloads. Nvidia’s key innovation is CUDA in conjunction with hardware that allows for parallelization. Flexible workloads. Broadcom’s focus is on AI XPU, which is ASICs for AI. Developers are accustomed to CUDA, and the ecosystem includes tools and libraries that are deeply integrated with Nvidia's hardware. Broadcom's software offerings, while growing, do not yet match Nvidia's in terms of AI-specific toolsets.

The key bet on Nvidia is Omnivesre, which according to Jensen is the CUDA of today. Omniverse.

Episode, December 15 2024

Book discussion “East of Eden” by John Steinbeck. 1/4 Good and evil. Allusions to Bible. 2/4 Humans have the choice and burden to define their own narrative. 3/4 Money made by sin and floats to virtue.4/4 Logos. Lee is the role of logic and knowledge. Right versus wrong is deeply embedded in Western culture inspired by Bible. What is good? What is evil. You know it when you see it. Abra quote “Aaron lives in his own story. He can’t deal with reality. Kaleb can. Lee says: “It’s ironic that Adam, the bible-totler lives of his father's stolen money. And Adam, another pure soul, should have inherited a prostitute's money, Lee says; “Money is made by the sinners and floats to the good souls.”

Episode, December 14 2024

Neurips 2024 Par4.DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning, Sergey Levine.

Develop a VLM based device control that navigates internet and executes via GUI.. Pre-train with offline RL. Advantage weighted Regression (AWR). Represent action oriented based knowledge about real world.

Fine tune with online RL. Use LLM as evaluator. What is failure? What is reward? Sparse, less sparse. Offline RL equivalent of GPT for text. Ilya Sutskever talk on future of AI: Agents,Reasoning,Understanding

Episode, December 12 2024 II

Neurips 2024 Part 3. Learning to Assist Humans without Inferring Rewards.

Sergey Levine et. al How to assist a human with an AI when the human’s reward function is not known?Two approaches.

Inverse RL. Humans not consistent.
Empowerment. Effect a larger degree of change over future outcomes.

ESR - Empowerment via Successor Representation. Maps the current state of with future states. Mutual information estimates how much a current action of the agent affects future states of the human. RL optimizes this trajectory so the current action increases the action space of the human.

Episode, December 12 2024

DOGE will be much more effective and boost real growth. To say that government can't be fixed because of complexity is nonsense. Tautology. Fixing government waste starts with getting rid of complexity. Government waste compounds into the real economy. Same applies to savings. Three vectors DOGE will attack to boost growth. 1/3 Cut spending and reform procurement. The latter has more impact. 2/3 Impose term and spending limits on Congress, particularly Senate. Congress is part of the problem and must be fixed. 3/3 Cutting government waste and corruption compounds into the real economy.

Expect high real economic growth due to innovation and deflation.

Episode, December 11 2024

Neurips 2024 Part 2

Fine tuning VLMs with RL agents. Sergey Levine et. all. CoT as a path towards reasoning. Use VLM to understand semantics of agent’s environment and LLM to reason towards goal oriented action.CoT reasoning is prompt design. RL takes over the prompt design.

Episode, December 10 2024

Neurips 2024 Part 1

FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness.Solving two key problems. Solving two key problems. Memory Wall and Efficiency, Visual approach to understanding memory consumption and develop formal models to automate I/O aware hardware solutions. Exploit parallelism in Deep Learning. Signal processing concepts. How to use as little, but enough information to train models by either eliminating data (quantizing) or parallelizing.

Episode, December 6 2024

❤️ AI and innovation. A study. How does AI help material science? Inverse material design. Determine desired property and let AI figure out the most optimal structure. AI increases novelty (as measured by difference in compounds discovered). AI drives innovation (new technical terms used in papers). AI changes products ( novelty of product lines introduced). AI driven productivity growth in science compounds. Top scientists are better at evaluating potential compounds. AI helps solve multi dimensionality problem. Pre-train on existing dataset. Fine tune. Use RL to discover new materials. Evaluation?

Episode, December 3 2024

Chris Re on axiomatic knowledge.Make an effort to understand and explain. What’s the value of axiomatic knowledge? There is no science without theory because we have to know where to look. Re’s essay on axiomatic knowledge opens the door for foundation models to be a solution to high dimensionality. Axioms are dimensionality reduction. New paradigm for computer science. Instead of focusing on one narrow problem and solve with precision, you train a general model and use model to solve original problem. Foundation models solve the ‘Death by 1000 cuts problem’. Foundation models are kind of like a classical approach towards quantum computing. Nature operates in Hilbert Space.

Episode, December 2 2024

Reaction to David Precht interview with Robert Menasse. What’s wrong with Europe? Three pillars of a state. Purse, Sword and Vote. Europe has none of it. Hence, the only political forces that fight for purse, sword and votes are the conservative parties. That’s why they are winning. EU makes sense but must be defined around purse, sword and vote. Curb power grab by unelected officials.

Episode, November 25 2024 II

Richard Sutton. How I really feel. AI community focusing on can instead of can’t. Plasticity in learning. Transient learning (learn and deploy finished model) is wrong. Normalization and back propagation are problematic objectives. Gradient decent is designed for transient learning. Once you minimize loss, that’s it. Small rational goals lead to large, abstract concepts. (I want to get good grades leads to I want to get PhD). “Understanding intelligence will make us uncomfortable”. Work on your thoughts. “If you want other people to care about what you think, you should start caring about it yourself”.

Episode, November 25 2024

❤️ Foundation models for Robotics. Sergey Levine. Foundation models for embodied decision making agents part 22. Cross embodiment models perform better than specialized models. Same as in NLP. Take data from different embodiment and train one large generalize VLA model. VLA model works like NLP model, predicts next action. Open VLA, an open source VLA model trained on disperse, cross embodiment data. This project accomplishes several things:

open source, visibility into the model, parameters and architecture
transparent onboarding of new robot types, particularly low cost commodity hardware
Low rank compute adaptable to low cost compute

Chain of thought in robotics. Another interesting idea is using diffusion models to imagine scenarios and have the robot train itself on such ‘synthetic’ scenarios.

Episode, November 24 2024

❤️ High and low level modeling for Humanoid Robots. Xiaolong Wang (UCSD). High level focuses on training Vision-Language-Action (VLA) models with human video data for both navigation and manipulation. Low level involves developing low-level robot manipulation skills through teleoperation. By combining human VLA with low-level robot skills, pathway toward realizing general-purpose humanoid robots. Paper on teleoperation system based on VR sets. Like Tesla Optimus presentations. Specific example of fine dexterity. Pen manipulation trained in simulation. Dieter Fox at the AI symposium at Michigan. Talks about using VLA models for higher level reasoning and then lower level task calls. Problem with VLA is that it’s 2D, not 3D. Translation to 3D important.Continuous learning. Borrow ideas from Richard Sutton’s idea of plasticity. Instead of one shot training, conintdus training and learning.

Episode, November 22 2024

Nvidia Q3 Earnings. Risk to the upside. Three vectors going their way. 1/3 Demand. 2/3 Supply. 3/3 Competition. 1. Demand for tokens growing exponentially. 2. Production growing at Jensen’s law, which is annual doubling of performance and halving watt/token. 3. Lack of competition because company defining decisions have been made to build a large infrastructure to produce tokens efficiently. Omniverse is next Cuda. 40% CAGR projection. AI is driven by Jensen’s law. General models with more training continuously improve AI.

Episode, November 20 2024

❤️ Richard Sutton. Plasticity in AI. Models loose plasticity, i.e. ability to learn. Not same as forgetting. From the paper: “..it is usually not effective to simply continue training on new data. The effect of the new data is either too large or too small and not properly balanced with old data.”How to induce plasticity? 1/2 Keep weights small. 2/2 Induce variability by reinitializing weights with small contribution. Don’t let certain weights overwhelm. Induce variability. Traditional AI algorithms are optimized for one time learning. Not for continual. Problem in robotics; How do I know the next model is better? Wrong question. Induce plasticity in the model and make sure it improves with training.

Episode, November 18 2024

Book discussion “The Pole” by JM Coetzee. Feels like Beckett. Absurd, automaton. 1/2 What does it mean to be human? Emergence through extremes. The pole is Polish and the other side of a “pole”. 2/2 What is art? Poetry is a wedge of conciseness between human logic and reality. Emotion. Excitement.

Episode, November 17 2024

Jitendra Malik at AI summit Michigan. When will we have home robots? Like home computers in 1980.1/3 Locomotion. Humanoid motion as next toke prediction. From paper: Train a general transformer model to auto-regressively predict shifted input sequences. In contrast to language, the nature of data in robotics is different, like sensorimotor trajectories which we view as the sentences of the physical world. 2/3 Navigation.Navigation. Use memory of observed instances. Multimodal input (image, text). “Go find cup and put it on sink”. 3/3 Manipulation.Hand-Object Interaction. Take video. Translate object and hand in 3D virtual space. Map actions in virtual space. Use physics based simulation to learn a general model of hand object interaction.

Podcast Site continued

"Not Even Wrong" Podcast Investing in fundamentally new concepts and engineering practices with large impact.

"Not Even Wrong" Podcast
Investing in fundamentally new concepts and engineering practices with large impact.