"Not Even Wrong" Podcast
Investing in fundamentally new concepts and engineering practices with large impact.
Neurips 2024 Part 14. 14. Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models. UC Berkley, Distill a problem into its principal components. For example, a specific task can be distilled into a quadratic equation. Meta problem identification, buffering and solver. What’s the appropriate chain of thought? Explore with Tree of Thought (explore different prompts), Chain of Thought (different prompt yield different results with fixed weights). Find structure in problem. Train foundation model for inference. Create a meta buffer.A pre-trained LLM is a compressed version of knowledge. A pre-trained inference engine is a compressed version of problem solving approaches.
Reaction to Marc Andreessen in Hoover interview. 1/3 China’s strong economy is not a threat, it’s the solution. 2/3 Technology optimism is a moral obligation because it fosters real economic growth which is the only thing that matters for improving the human condition. 3/3 DOGE, government spending must be reigned in. The onus is on congress to prove that spending is constitutional and not the other way around. It can be done. There are legal ways, economic arguments and political hacks to get this done.
Neurips 2024 Part 13. DreamDrive: Generative 4D Scene Modeling from Street View Images
Take 2D images from ego cameras and synthesize 4D generations. Self Driving simulation and execution. Take 2D image. Generate priors with diffusions. Estimate 3D representation through Gaussian splatting. In particular learn the depth parameters for 3D. Use static versus dynamic decomposition for 4D. Gaussians that move together belong to the same dynamic object. Diffusion + Gaussian Splatting + Dynamic decomposition.
Neurips 2024 Part 12. Memorize What Matters: Emergent Scene Decomposition from Multitraverse, Marco Pavone. Self-supervised and camera-only 3D mapping approach, reducing the need for human annotations and LiDAR to enable selective retention. Goal differentiate between state moving images when driving down a road. Camera based image as input. Gaussian splatting to segment pixels into learnable parameters.Unsupervised learning what’s static and what isn’t. Further work: Learn what matters, pose estimation, semantic understanding of environment (flag is moving, but static. Dog might be facing away from street, construction worker waving).
Neurips 2024 Part 11. Vision Foundation Models for pose estimation. Pieter Abbeel.
Goal is to generalize model, such that if object is seen that wasn’t present during training, model still predicts pose estimation. Pose estimation is related to semantics discussed in Part 9. (“The glass is upside down”). First match. Point cloud transformer to identify object and match with similar categories. 2. Related image to identified category and apply NOCS (normalized coordinate system). Use NOCS to predict pose. 3.Synthetic data do fine tune the foundation model.
Book discussion “The Noise of Time” by Julian Barnes. 1/3 Historical novel about how it felt to live in the Soviet Union from the perspective of a composer. Gulag Archipelago, Solzhenitsyn. Society derails because truth is fabricated. His life felt like he was always off on the metronome. (Lots of allusions to music and life in the book). “When you chop wood, chips fly. But what if you lay down the axe and all you’ve done is reduce the timber to chips.” “We thought we are in charge of liberty. Turns out, liberty is in charge of us.” 2/3 Value of integrity. You loose integrity - you die. “Integrity is like virginity. Once lost, you can't recoup it.” Natural progression of human life is from optimism to pessimism. Irony creeps in to deal with the lack of clarity and truth. Irony keeps you human.” “Conscience lost it’s evolutionary value in Soviet Union. But did it ever have one?” 3/3 Art and music.”The only music that matters is the inner music. Some people turn it into real music. And if it is strong enough and withstands the noise of time, it eventually turns into the whisper of history. “
Lessons from Peter Thiel - most value creating investor. Based on Zero to One. “All happy companies are happy in their unique way. All unhappy companies are unhappy in the same way.” The latter are caught in incrementalism and lack of differentiation (Uber, melted cheese store, banks). Competition destroys returns. Not fittest but niche. However, wealth requires Zero to One and scaling from one to N. Thiel should write a sequel for one to N. What is Zero to One? It’s the execution of a secret with relentless conviction and Columbus style journey into the unknown. Difference between mystery (unachievable) and secret (achievable). Examples of the past: Walt Disney drawing cartoons on film. Steve Jobs creating app store. Examples today: Cybercab, real world AI enabling new industrial value chains.
George Soros - the master of finance JiuJitsu
Lean on events that happen anyway. Don’t force it - go with the forces. For example, when the British pound was pegged to the D-Mark, the Brits couldn’t sustain the forces. Macro investing is top down (Soros) or bottom up (Druckenmiller). Lean on strong global economic tectonic movements. Since politicians typically fight against it, timing is of essence. Examples of current global forces - EU and Euro break up. Chinese Renminbi weakening due to debt crisis. US budget deficit not sustainable - Bitcoin demand surging.
Neurips 2024 Part 10. Physically Compatible 3D Object Modeling from a Single Image, Kaiming He.
Computational framework transforms single images into 3D physical objects.Three orthogonal attributes: mechanical properties, external forces, and rest-shape geometry. Reconstruct physics from images. Fundamentally, an image is more than a visual representation of an object: It captures a physical snapshot of the object in a state of static equilibrium, under the influence of real-world forces. Two ways to define internal physics of a material. Use priors or learn (Sutton). How about when materials interact ie. when one material is not just exposed to gravity but to other forces of materials that themselves are constrained with internal forces. Technique: Take image of object. Dissect it into small tetrahedral. Define transformation matrices between edges and vertices of tetrahedral subsections.
Book Discussion “For Whom the Bell Tolls”, Earnest Hemingway.
Tesla Q4 deliveries below exceptions due to Osborne effect (Model Y).
Book discussion. 1/5 Existentialism. What you do is who you are. 2/5 Life is fight and love. Not Darwinian but human. Life happens when those things happen. Madrid (the outside world) doesn’t have that. Parallel to La Peste and Oran under quarantine. 3/5 Good and Evil not clear. Does it even matter? Yes. You know it when you see it. 4/5 Why do Fascism and Communism rise in the 1930s? Nietzsche and/or Prometheus. 5/5 Answer is liberty, knowledge. But how to preserve liberty? Gödel self referential paradox. With inquiry and Popperian dialogue.
Politics. Why Elon Musk is important for Democracy.
Democracy is a system where power changes without violence. It’s predicated on freedom of speech, freedom of thought and freedom of voluntary transaction. Musk is neither a threat to democracy nor does he impede freedom. To the contrary, he is pushing for more of it. 1/3 Free speech. Musk has spend billions defending the right to free speech. 2/3 DOGE. Eliminating the fourth branch of government -unelected bureaucrats making their own laws. 3/3 Entrepreneurs are harbingers of freedom. Wealth and Freedom are two sides of the same coin.
Neurips 2024 Part 9. Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making, Percy Liang
Foundation Models for Decision Making Agents Part 24. What does it mean to do something? Tasks differ. LLMs can help through semantic knowledge, but they aren’t trained for physics. There is a knowledge gap. Should we use LLMs as priors or train end to end. (Bitter Lesson!).
Sparse rewards, i.e. the path from state A to final state is a curvy path with lots of subgoals. Standardize some of those curvy paths. Define task, either through state (open fridge) or through action (go to door, pull etc..). LLMs miss out on trajectories. For example “empty the fridge of all outdated items (State oriented) or take everything out that is outdated (action oriented)”. Formal language is LTL. (Linear Temporal Logic). Connects states with actions and has temporal instructions such as “next” or “when done”.
Favorite books read in 2024
1/5 “The Painted Word" by Tom Wolfe. Observation is theory laden. 2/5 “Girl in a Band” by Kim Gordon. Zero to one in the New York 70s art scene. 3/5 “Doctor Zhivago”, by Boris Pasternak. Revolutions typically replace on Leviathan with another. 4/5 “The Whole Story” by John Mackey. Zero to one in real life. 5/5 “East of Eden” by John Steinbeck. Good and evil exist. And money flows to the good.
What works and what doesn’t for 2025
Works: 1/3 Trump, DOGE, less regulation. 2/3 AI and space. 3/3 Bitcoin
Doesn’t Work:
1/3 High Debt and inflation in healthcare, housing and education. 2/3 AI displacing jobs at unprecedented speed. 3/3 US dollar looses global reserve currency status
Neurips 2024 Part 8. Smoothie: Label Free Language Model Routing.
Chris Re. Which LLM is optimal for which task. An LLM-powered chatbot may be asked to write code, answer questions about different domains, summarize documents, perform extraction, and more. Train LLM router with unsupervised learning. Statistical similarities, nearest neighbors and gaussian approximation for LLM quality estimation. Also, develop prompt discriminators. Model chooses optimal prompts for problem solving. Self assembling optimal chain of thought. There are thousands of LLMs and SLMs to choose from.
Three forces of wealth creation in the coming years. 1/3 Technology. AI and Space. What matters is innovation. Driving value. 2/3 DOGE and Trump. Reduce waste in US government. Positive spillover to economy. 3/3 US dollar loosing reserve currency status. Forcing everybody to either save Dollar or transition to Bitcoin. Interesting interview with Stan Druckenmiller. Wealth creation is about sizing when you’re right.
Neurips 2024 Part 7
LLMs for smell. “You give people a substance and they label how its smells. Then you give an AI a chemical structure corresponding to the substance and the AI predicts the substance and/or the smell.“Binary evolution is easier than continuous. Limitations, intensity or concentration and available datasets. Mapping airborne odorants with AI. Cybercab smells. Labelling chemical structures. Mapping with human perception. Data representations, how can chemical structures be represented with vectors for machine learning?
Neurips 2024 Part 6
LLMs for math and how to grok.
Math depends on the problem. Calculate, Proof. Describe. Physics for example requires math to describe. Building AI for math depends on what you need. Problem (regression, addition etc. ), Task (different types of regression) and data (different data). Key question of the paper is when does a model transition from memorizing to generalization. What is generalization? It means skill composition. How? Train model with lots of data and diverse tasks and problems. The more diverse, the better generalization. Attention-head design determines mapping ability. MLP (multi layer perceptron) determines ability to calculate (sequential task). At the limit every model can be thought of as memorizing if you train enough data and tasks and problems. Sample efficiency. Tokenization matters. How to tokenize numbers?
Neurips 2024 Part 5. Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control, Dhruv Batra. Vision language models are useful for robotics. Why? Because language is a pre-trained prior that helps with semantic understanding. This paper discusses using diffusion models for better segmentation of scene. Fine grained control.
Design space for diffusion models: latent concatenation. Denoising time steps. Connect diffusion model with text. Design choice depends on task. For example, if you task is to find a hotel in Shanghai, rough segmentation is enough. If you task is to find excellent pizza place, you need more granular segmentation. Diffusion lends itself to more fine grained segmentation because the de-noising process is a gradual uncovering of ground truth.
Human is the only animal with the obligation to create their own narrative. From J. Steinbeck’s “East of Eden”. You have a choice. You have an obligation to design your own narrative. Zero to One. Wealth is only created with Zero to one. Life is Zero to one. Problem is, it’s a high rope dance between good and evil, between purity and sin. Both extremes are dangerous. Find the path in-between. There is no such thing as purity, neither in morals, nor math (Gödel) nor in science (Popper). Solutions to problems (Popper). The measure of success in life is love (Sam Hamilton) . Wall Street likes to make their own narratives and often fails to adapt.
Three presentations that support arguments for Omniverse. Hinton, Xiao, Sutskever. Geoff Hinton says that intelligence can be learned. Before, AI community coalesced around the statement: “The essence of intelligence is reasoning.”Hinton says: “Wrong. The essence of intelligence is learning.”The same applies to physics. Ted Xiao mentions three fundamental areas where robotic foundation models can be improved. Scaling, Context (actuators), Evaluation. Ilia Sutskever pre-training will end. No, it wont because of physics. But there is a very important thing that he says in this presentation, which is: “We dared to believe that this will work.” Same applies to Jensen and physics.
Broadcom versus Nvidia. It’s the Software stupid! AI XPU, Semiconductor technology for AI workloads. Focused, predictable workloads. Nvidia’s key innovation is CUDA in conjunction with hardware that allows for parallelization. Flexible workloads. Broadcom’s focus is on AI XPU, which is ASICs for AI. Developers are accustomed to CUDA, and the ecosystem includes tools and libraries that are deeply integrated with Nvidia's hardware. Broadcom's software offerings, while growing, do not yet match Nvidia's in terms of AI-specific toolsets.
The key bet on Nvidia is Omnivesre, which according to Jensen is the CUDA of today. Omniverse.
Book discussion “East of Eden” by John Steinbeck. 1/4 Good and evil. Allusions to Bible. 2/4 Humans have the choice and burden to define their own narrative. 3/4 Money made by sin and floats to virtue.4/4 Logos. Lee is the role of logic and knowledge. Right versus wrong is deeply embedded in Western culture inspired by Bible. What is good? What is evil. You know it when you see it. Abra quote “Aaron lives in his own story. He can’t deal with reality. Kaleb can. Lee says: “It’s ironic that Adam, the bible-totler lives of his father's stolen money. And Adam, another pure soul, should have inherited a prostitute's money, Lee says; “Money is made by the sinners and floats to the good souls.”
Neurips 2024 Par4.DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning, Sergey Levine.
Develop a VLM based device control that navigates internet and executes via GUI.. Pre-train with offline RL. Advantage weighted Regression (AWR). Represent action oriented based knowledge about real world.
Fine tune with online RL. Use LLM as evaluator. What is failure? What is reward? Sparse, less sparse. Offline RL equivalent of GPT for text. Ilya Sutskever talk on future of AI: Agents,Reasoning,Understanding
Neurips 2024 Part 3. Learning to Assist Humans without Inferring Rewards.
Sergey Levine et. al How to assist a human with an AI when the human’s reward function is not known?Two approaches.
-
Inverse RL. Humans not consistent.
-
Empowerment. Effect a larger degree of change over future outcomes.
ESR - Empowerment via Successor Representation. Maps the current state of with future states. Mutual information estimates how much a current action of the agent affects future states of the human. RL optimizes this trajectory so the current action increases the action space of the human.
DOGE will be much more effective and boost real growth. To say that government can't be fixed because of complexity is nonsense. Tautology. Fixing government waste starts with getting rid of complexity. Government waste compounds into the real economy. Same applies to savings. Three vectors DOGE will attack to boost growth. 1/3 Cut spending and reform procurement. The latter has more impact. 2/3 Impose term and spending limits on Congress, particularly Senate. Congress is part of the problem and must be fixed. 3/3 Cutting government waste and corruption compounds into the real economy.
Expect high real economic growth due to innovation and deflation.
Neurips 2024 Part 2
Fine tuning VLMs with RL agents. Sergey Levine et. all. CoT as a path towards reasoning. Use VLM to understand semantics of agent’s environment and LLM to reason towards goal oriented action.CoT reasoning is prompt design. RL takes over the prompt design.
Neurips 2024 Part 1
FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness.Solving two key problems. Solving two key problems. Memory Wall and Efficiency, Visual approach to understanding memory consumption and develop formal models to automate I/O aware hardware solutions. Exploit parallelism in Deep Learning. Signal processing concepts. How to use as little, but enough information to train models by either eliminating data (quantizing) or parallelizing.
❤️ AI and innovation. A study. How does AI help material science? Inverse material design. Determine desired property and let AI figure out the most optimal structure. AI increases novelty (as measured by difference in compounds discovered). AI drives innovation (new technical terms used in papers). AI changes products ( novelty of product lines introduced). AI driven productivity growth in science compounds. Top scientists are better at evaluating potential compounds. AI helps solve multi dimensionality problem. Pre-train on existing dataset. Fine tune. Use RL to discover new materials. Evaluation?
Chris Re on axiomatic knowledge.Make an effort to understand and explain. What’s the value of axiomatic knowledge? There is no science without theory because we have to know where to look. Re’s essay on axiomatic knowledge opens the door for foundation models to be a solution to high dimensionality. Axioms are dimensionality reduction. New paradigm for computer science. Instead of focusing on one narrow problem and solve with precision, you train a general model and use model to solve original problem. Foundation models solve the ‘Death by 1000 cuts problem’. Foundation models are kind of like a classical approach towards quantum computing. Nature operates in Hilbert Space.
Reaction to David Precht interview with Robert Menasse. What’s wrong with Europe? Three pillars of a state. Purse, Sword and Vote. Europe has none of it. Hence, the only political forces that fight for purse, sword and votes are the conservative parties. That’s why they are winning. EU makes sense but must be defined around purse, sword and vote. Curb power grab by unelected officials.
Richard Sutton. How I really feel. AI community focusing on can instead of can’t. Plasticity in learning. Transient learning (learn and deploy finished model) is wrong. Normalization and back propagation are problematic objectives. Gradient decent is designed for transient learning. Once you minimize loss, that’s it. Small rational goals lead to large, abstract concepts. (I want to get good grades leads to I want to get PhD). “Understanding intelligence will make us uncomfortable”. Work on your thoughts. “If you want other people to care about what you think, you should start caring about it yourself”.
❤️ Foundation models for Robotics. Sergey Levine. Foundation models for embodied decision making agents part 22. Cross embodiment models perform better than specialized models. Same as in NLP. Take data from different embodiment and train one large generalize VLA model. VLA model works like NLP model, predicts next action. Open VLA, an open source VLA model trained on disperse, cross embodiment data. This project accomplishes several things:
-
open source, visibility into the model, parameters and architecture
-
transparent onboarding of new robot types, particularly low cost commodity hardware
-
Low rank compute adaptable to low cost compute
Chain of thought in robotics. Another interesting idea is using diffusion models to imagine scenarios and have the robot train itself on such ‘synthetic’ scenarios.
❤️ High and low level modeling for Humanoid Robots. Xiaolong Wang (UCSD). High level focuses on training Vision-Language-Action (VLA) models with human video data for both navigation and manipulation. Low level involves developing low-level robot manipulation skills through teleoperation. By combining human VLA with low-level robot skills, pathway toward realizing general-purpose humanoid robots. Paper on teleoperation system based on VR sets. Like Tesla Optimus presentations. Specific example of fine dexterity. Pen manipulation trained in simulation. Dieter Fox at the AI symposium at Michigan. Talks about using VLA models for higher level reasoning and then lower level task calls. Problem with VLA is that it’s 2D, not 3D. Translation to 3D important.Continuous learning. Borrow ideas from Richard Sutton’s idea of plasticity. Instead of one shot training, conintdus training and learning.
Nvidia Q3 Earnings. Risk to the upside. Three vectors going their way. 1/3 Demand. 2/3 Supply. 3/3 Competition. 1. Demand for tokens growing exponentially. 2. Production growing at Jensen’s law, which is annual doubling of performance and halving watt/token. 3. Lack of competition because company defining decisions have been made to build a large infrastructure to produce tokens efficiently. Omniverse is next Cuda. 40% CAGR projection. AI is driven by Jensen’s law. General models with more training continuously improve AI.
❤️ Richard Sutton. Plasticity in AI. Models loose plasticity, i.e. ability to learn. Not same as forgetting. From the paper: “..it is usually not effective to simply continue training on new data. The effect of the new data is either too large or too small and not properly balanced with old data.”How to induce plasticity? 1/2 Keep weights small. 2/2 Induce variability by reinitializing weights with small contribution. Don’t let certain weights overwhelm. Induce variability. Traditional AI algorithms are optimized for one time learning. Not for continual. Problem in robotics; How do I know the next model is better? Wrong question. Induce plasticity in the model and make sure it improves with training.
Book discussion “The Pole” by JM Coetzee. Feels like Beckett. Absurd, automaton. 1/2 What does it mean to be human? Emergence through extremes. The pole is Polish and the other side of a “pole”. 2/2 What is art? Poetry is a wedge of conciseness between human logic and reality. Emotion. Excitement.
Jitendra Malik at AI summit Michigan. When will we have home robots? Like home computers in 1980.1/3 Locomotion. Humanoid motion as next toke prediction. From paper: Train a general transformer model to auto-regressively predict shifted input sequences. In contrast to language, the nature of data in robotics is different, like sensorimotor trajectories which we view as the sentences of the physical world. 2/3 Navigation.Navigation. Use memory of observed instances. Multimodal input (image, text). “Go find cup and put it on sink”. 3/3 Manipulation.Hand-Object Interaction. Take video. Translate object and hand in 3D virtual space. Map actions in virtual space. Use physics based simulation to learn a general model of hand object interaction.