"Not Even Wrong" Podcast
Investing in fundamentally new concepts and engineering practices with large impact.
Previous site
All happy countries are alike in their happiness. All unhappy countries are unhappy in their unique ways. It's about property rights.
General Artificial Intelligence. Define it, measure it, build it. Generalization is the central concept in AI. Not skill. Measure and maximize generalization. Relationship between past experiences and the potential operational area. From the paper: “General intelligence is the power to adapt to change, acquire skills, and solve previously unseen problems – not skill itself. To learn from data, one must make assumptions about it. Goedel’s. Intelligence = ability to mine the experience, the bits that came in, and use them for abstractions to solve previously unseen problems. Information efficiency.
Why is the bond market tolerating frivolous deficit spending and Biden/Harris economic policy? Because there is no real bond market. The Fed backstops the bond market with Standing Repo Facility. 70% of US bond market is traded by hedge funds. If they need cash they repo Treasuries to Wall Street and Wall Street has permanent access to Fed balance sheet. The Fed is de facto financing deficits. Harris can spend and will not be punished by bond market. No Clinton scenario.
If Harris wins, more deficits, gold up, stocks fine but economic growth lower. If Trump wins less deficit, dollar stronger, economy stronger, stocks up because laissez faire.
How can physics improve machine learning and how can machine learning help physics? Machine learning can be compared to physics before quantum theory was developed. What is data, good data, bad data. What is the optimal dimensionally of neural nets? Depth, width, hyper parameters such as learning rate, tokenizer etc. What is minimum amount of data to learn? What does learning mean? How can ML help physics? Statistical physics, dimensionally reduction. In Cosmology gravitational bending (dark matter detection), redshift (expansion), particle physics (fundamental interaction within nucleus. Quantum anybody systems. Quantum materials. Quantum Computing helps better understand ML and ML can help quantum computation. In principle, ML and physics works best in likelihood free inference, when distribution not known. Score based methods, with iteration. Real, sim, real, sim…etc.
Multi Agent Interaction. Discussing two presentations. 1/2 Deep Reinforcement Learning for multi agent interaction. Global value function based on Graph Neural Net, deep RL for individual decision making. Pareto optimality for network. Robinson Crusoe omnipotent robot vs. commodity robots working as complex networks. Warehouse robotics. Do we need rules? Does the network converge faster to Pareto Optimality with or without rules? 2/2 Ant colony is a network of commodity intelligence with complex goals. Create robot ants to disturb colony and study robustness of network.
Book discussion “Flying Blind” by Peter Robison. Inside view on how Boeing’s engineering culture eroded. 1/3 Corporate governance matters. Longterm winning is important and it means focus on what it takes to win, not to enrich managers. 2/3 Regulatory governance is important. Like the generator/discriminator model in AI. Government must be efficient 3/3 Engineering companies create wealth by focusing on rapid iteration and longterm value to customers. Finding right reward function when iterating is difficult. Governance (both corporate and regulatory) is important to find right path of innovation.
Kurt Gödel, Tucker Carlson and Karl Popper. Science is a method, not the truth. This applies to the science of science, which is epistemology. Kurt Gödel showed the limits of rational explanations and science as we know it. Tucker Carlson suggest this could be used as a case for falling back to creationism. Wrong. Karl Popper taught us to replace current knowledge with better explanations. If science has limits, find a better explanation of what science is.
Market discussion. Crash in Japan. We believe a prime broker and/or hedge fund blew up. Driven by ongoing devaluation of Yen. Central bank intervention. Lower rates in US will follow as response. Tesla IR call. 1/2 Cybertruck not necessarily a utility vehicle but urban status symbol. Hermes bag on wheels. Margins structurally lower. Battery must get better and/or cheaper. More storage capacity per dollar. 2/2 FSD penetration will follow AWS penetration. First, nice to have, then indispensable because native products enabled.
Sim to Real training, real physics and scene generation. Two papers out of the University of Washington tackle the Sim to real problem.Use real robot data to train simulator. Paper 1 How to best approximate real physics in simulator.(friction, mass etc.)? Fisher information. Iterate. First in sim, develop policy. Apply in real, estimate relevant parameters (Fisher Information), then back to sim for high volume iteration.
Paper 2. Generate scenes from RGB images. Create paired dataset from images to scenes. Learn how to generate. Then use generated scenes to train robot. (See kitchen, generate scenes in kitchen, train robot in sim.).
Iterative approach towards robot learning. Start with small sample size in sim. Learn policy. Then apply in real to estimate physics parameters. Then go back to sim and train through volume iteration.
Market discussion. Increased chances of Kamala Harris win drives volatility, recession fears. Stagflation fears. Government spending out of control. Expected to get worse with Harris.
RLHF versus Imitation Learning. Discussion. Complex tasks are hard to find an optimal solution to but easy to evaluate. Like NP complete problems. Driving. What is good driving? It’s hard to drive optimally. But it’s easy to evaluate option A versus option B. It’s hard to write a good poem. But it’s easy to evaluate poem A versus poem B. Fine tuning is reducing the space of possible trajectories so inference can be done more efficiently.
Agentic AI. Ruslan Salakhutdinov presentation on how to design agents that can perform tasks specified by natural language instructions, perform efficient exploration and long-term planning, build and utilize 3D semantic maps, while generalizing across domains and tasks. Imagine a Tesla FSD. You tell it to take you to a nice beach to enjoy the sunset.Understand. Explore and execute.
Tesla Earnings Call Q2 2024. 1/3 Margins low. Small volume growth, ASP decline, cost not declining as fast. Automotive GM below 15%, total GM better due to credits and energy gross profit. Margin improvements through ASP stabilization, cost decline (in particular dry coating of anode and cathode 4680), more FSD attachment (higher ASP). 2/3 AI strategy. Fully integrated end to end stack solving robotics. Not task nor embodiment constrained. Generalized, data intensive training strategy capitalizing on transfer learning. Commercial success in AI is maximizing solutions per dollar. Tokens per dollar. 3/3 Energy transition. Storage is key to decarbonization. Reducing peak loads and grid stabilization. Full stack solution, not just equipment. Like AWS. Solutions, not equipment. Software.
Book discussion “Crossroads” by Jonathan Franzen. 1/4 Existentialism. What is a person? Character = actions. 2/4 Most people are weak. But not all. 3/4 Weakness is dangerous and typically hurts good people. 4/4 Genius is childhood bottled up and ready to be used. Children are autonomous learning agents until adults interfere.
Demonstration data versus autonomous data for robotics. Foundation models for embodied decision making agents Part 23. Sergey Levine presentation on pertaining robot models. Demonstration data is specific and shows how to do things. Autonomous data explores. The latter requires a reward function. Goal setting layer. What is the goal. Survival.
Thoughts on conversation between Albert Gu and Sam Charrington on the TWiML podcast. Foundation models for decision making agents Part 22. Transformer vs. State Space Models. A transformer is a general representation of all possible states the tokens can be in relations with. Knowledge is the ability to draw explanations from this compressed knowledge. It’s solutions per dollar, not token per dollar. What is a token? The unit of analysis. How much of the world does a model need to remember?
Witness. The process of writing a story. Part 1. Developing characters. Characters are defined by actions. Roots matter. Narrative is a character, too. Interaction between characters and narrative. Mutual influence. Market update: Trump high chance of winning presidency. He is laissez faire because he doesn’t have the patience to craft economic policy. Crony is different, not important. The less involvement by government the better because we invest in winners. Biden policy has been detrimental for EV and autonomous driving. We expect Trump to be positive even thought his narrative might suggest something else.
Transfer learning in robotics. Generalize across tasks, environments and embodiments. First time this was observed around 2018 at CMU. Multitask learning leads to better performance. Why? Navigation? Chicken and egg problem in robotics. Data to train. Data comes from robots. Scalability, low cost robots. Don’t break. Diversity of tasks, lab environments, hardware. Novel approach towards multimodal instruction learning. MINT. Vision, semantic text and low level actuator simultaneously learned. Can Tesla teach Optimus with FSD? In principle everything that moves solves a navigation problem. Translate egocentric view to particular embodiment.
Predicting 3D video sequences and world models. Foundation models for decision making agents part 21. Do we need world models? We argue that any robot has something akin to a world model. Predicting the future environment with the agent’s ego behavior implicit is what robots do. World model can be separate from actuator model reconciled with discriminator. Transfer learning shows that implicit training, generalization, less priors and lots of quality data is path forward. Tesla is on that track.
Tesla delays robotaxi event. Stock correction. 1/4 What is robotaxi event? Business model. Technology, software, engineering, service. 2/4 Delays are features not bugs of high iteration companies. 3/4 Where do we stand with autonomy? Regulatory? China? 4/4 Tesla upside vs stock volatility. Volatility is inherent in the business model because rapid innovation is key to solving difficult problems.
Multi-token prediction in LLMs. Meta. Paper. Extension of CNN idea. Capture symmetries. Physics is about predicting symmetries and understanding when they break. So is AI. How many n-tokens should be predicted. Depends on task. Understanding when symmetries break is strong latent representation of physics. Pareto optimality in terms of how many n-tokens predicted, how many parameters, how much data to train. Faster, lower energy budget inference.
Foundation models for embodied decision making agents part 20. Building simulator means learning realty. No need to understand, but solve problems physics is solving. Three p’s of end to end driving, perception, prediction and planning. Tokenize. Represent all necessary data (pixels, motion, geometry, low level sensory) in one matrix. Learn planning form there. Token per dollar. Knowledge per dollar. Foundation model for AV, Sherry Yang, Tesla has ground truth from driving data and evaluation engine with shadow mode. A/B testing. Testing models is important problem to be solved.
CVPR Part 5.
End to End autonomous driving. Put planning on top. Paper UniAD. All modules (perception, prediction and planning) are subordinated to the planning task. Transformer based representation of reality. Use tokens for planning. Parallelism.
GAIA, simulator for AV.
Foundation model for AV simulation. Sherry Yang. Deep Mind. Action to video and video to action. Represent action and images in latent space. Learn end to end and then simulate environments. To build a simulator you have to learn reality.
Reflection on Nvidia. Risk to the upside. Parallelism is all you need. Transformer architecture is driven by tokens and parallelism. Tokens per dollar. Better compression of knowledge. Token per dollar is equal to knowledge per dollar. Solutions per dollar.
CVPR Part 4. Chelsea Finn. Open VLA. Pre-train VLM on internet scale data. Fine tune on robot specific tasks. Talk to robots. Influence their leaning through instruct, imitation. Data curation leads to better performance of smaller model.
Trevor Darrell, Berkley. Transformer architecture for robots. Next move prediction. Motor-sensory input is the sentence. Tokens are sensors, actuators. Predict whole sentences, i.e. token sequences with sensory and motor tokens.
Ilya R. Berkley presents paper about transformer based robot learning. Input is trajectory of a sequence of multi-view RGB camera images, proprioceptive robot states, and actions.
Discussing essay “Trolley Economics”. Economics can be seen as finding solutions to trolley problems. Most interesting are self referential trolley problems where you make decisions that affect your own future self. Trolley problems are like Schrödinger’s cat, a superposition of outcomes. Property rights and markets offer approximate solutions. Economists who disregard trolley problems are “Trolley Economists”.
Second quarter. Where do we stand? Tesla. 1/4 What are the constants? Quality up and cost down leads to volume growth. 2/4 FSD. Data advantage and model experimentation. 3/4 Optimus transfer learning from FSD. 4/4 Energy storage strong. Nvidia. 1/3 Risk to the upside. More tokens trained in multimodal. 2/3 Token per dollar not compute per hour. 3/3 System level iteration, growth and lowering cost per token.
Discussing essay “The unreasonable effectiveness of mathematics in the natural sciences” by Eugene Wigner. 1/3 Math works well in physics because it captures the relationships between things in nature. 2/3 Are we fooling ourselves? Maybe math only works in the field where we’re looking and we’re looking because it works. 3/3 Areas where it doesn’t work: entanglement, infinity, DNA and biology.
Long sequence modeling. Eric Nguyen at TwiML podcast talks about HyenaDNA and EVO. Biotech 3.0.
Problem = long sequence modeling. Use signal processing methods, FFT and flash attention and CNNs. Discussed paper. Problem = apply LLMs methods to DNA. HyenaDNA paper presents a chromatin prediction method based on CNNs. Key here is to use DNA nucleotides as tokens. In EVO they used microbial DNA to train on larger dataset. Use foundation model as base to predict downstream effects of DNA changes. Trained on next token prediction. Token = nucleotide. Goal is to be able to design in virtu organisms with downstream functionalities. Can long context windows be applied to robotics?
Hedge Fund Myths Part 1. Predicting the future. Myth = Hedge fund managers must be good at predicting the future. Wrong. Actually, managers must do the exact opposite. Focus on things that don’t change over time. Examples: Optimize for cash flow. Make customer continuously happy. Offer lower price at higher quality. Iterate faster with lower cost of error correction.
Book discussion “Sleeping Giants” by Rene Denfeld. 1/3 Scientism is bad and leads to lunacy. Hurts people. 2/3 Never give up 3/3 Justice is a concept, not a cure. Law is a concept, not protection,
Foundation models for embodied decision making agents Part 19.
Ted Xiao at CVPR 24. Why foundation models. Data scaling. How? Tokenize. What? Spacial, action tokenization. Robotics is navigation problem. More research into how to tokenize, what data to train on and how to evaluate. Generalization means: pick up bottle, then pick up pen, then pick up anything. Real world data, internet data, data augmentation.
Reaction to Joseph Stiglitz on Capitalisn’t. Economics is about approximating solutions to the trolley problem. Stiglitz either doesn’t know or doesn’t care. His demagoguery shows contempt for science. I call this “Trolley Economics”.
CVPR24 Part 3. Paper on grounding visual language models. Use LLM to discriminate VLM and improve image segmentation. Paper on data augmentation for autonomous vehicle edge case training. Real driving dataset augmented with 3D obstacles. Multiple parallelism. Reward functions design can be seen as a data driven learning problem. Inverse RL is one way, good for bootstrapping. Massively parallel scene simulation is another.
CVPR24 Part 2. Paper discussion. Text to Drive. Use LLMs to generate driving scenarios and use LLMs as reward functions. Interesting approach to mapping high level semantics (car drives slow, car tailgates truck..) to low level information (speed, leg, right). LLMs as reward function implies knowledge. Knowledge is a tricky concept in AI. Data driven simulators and reward functions through inverse RL (Tesla) are better approaches.
CVPR24 Part 1. Two key observations at CVPR24. 1/2 Transfer learning in robotics. Navigation is key. Driving or hand manipulation of robot are similar problems because they rely on navigation. 2/2 Self driving has a reward fiction design problem. Tesla solved it by inverse RL form imitation learning of human driving.
Discussing the essay “Peak Academia” and why knowledge creation must be liberated from the shackles of gowns, tuition, and ideology. The history of universities shows that liberated thinking has enormous benefits to society. Today’s academia resembles the church in medieval Italy, with ideology, indoctrination, and intolerance rising. This situation led to the birth of the modern university. What is the alternative today? Disruption in education must happen in conjunction with disruption in politics. Texas.
Book discussion”In Ascension” by Martin MacInnes. 1/3 Human aspect of space travel. What does it mean to be human in an environment of deprivation 2/3 Science Fiction typically suffers from scientism. Because writers are advised by scientists who think highly of their status. 3/3 Narrative boring. Pedantry about science details because of pedantry of science advisors. However, the book offers a good description of the psychology of human space travel.
Tesla opens new fork. Decentralized transport, compute and energy company. Robotaxi is decentralized compute for transport. Inference chip is compute for AI inference and solar/storage is photons. All provided through a decentralized network. By offering idle capacity, pricing will be very competitive.
CVPR 2024 Part 2.
Inspired by the book “In Ascension”, we observe that the boundaries of life are blurred. Also, what is intelligence and what is life? Is life the definition of intelligence and vice versa? Dynamic adjustment to environment. What is optimal life/intelligence for in given environment? What if the intelligence changes the environment? Ideas for training robots. Build an AI that trains robots, like the AI personal trainer.
CVPR Part1. Qualcomm papers:
-
More accurate depth estimation from 2D and 3D. Use Language models to infer 3D from 2D and estimate depth.
-
Video stereo data. Parallel method to estimate depth from stereo camera pixels.
-
See video. ❤️ Learn from video and correct actions. AI driven personal trainer. Could this be used for training robots?
-
Speculative decoding. language model to predict tokens in a multimodal model. More tokens per second.
-
Omnidirectional computer vision workshop. Infer physics from 2D images? End to end learning physics from 2D.
Tesla shareholder meeting. We vote Yes for Elon Musk's pay package and Yes for the move to Texas. Here is why.
Pay package: Elon deserves to be paid because he is a leader with the right mix of risk taking, technological expertise and ability to galvanize highly skilled humans to do great things. Texas Move: Delaware judge is overreaching by telling us owners what’s good for us. Institutional shareholders don’t have skin in the game. Retail investors care because it’s their wealth on the line. Texas can fix this overreach.
Reaction to what is wrong with capitalism. How the left gave up the working class and how to fix it.
Elitism penetrates the American Democratic party. Cherry picking issues to further their wallet and influence instead of representing the working class. This void can only be filled by entrepreneurs, because they care most about wellbeing of workers. Tesla transcends hierarchies and offers upward mobility. That’s why leftish elites hate Musk. Separate knowledge creation and career pipeline from legacy academia.
3D rendering from 2D images using transformers and diffusion
Take 2D image. Tokenize in patches. Apply transformer architecture with self attention and cross attention. Reconstruct in triplane for 3D rendering. Use diffusion for rendering.
Is it possible to train images from 2D to 3D and then generalize across many more categories without having to fine tune or put priors in by category? Is there a latent structure in 2D to 3D rendering that is general and independent of particular types of images?
Essay discussion “Beyond the Limitation Game”
What happens to humans when machines do all the work?
-
Work will become more humane and less menial. Menial work is work that doesn’t involve the human advantage; RFC = Risk, Fight, Creativity.
-
Is there a limit to how much wealth creation a society can or wants to achieve? Status is a limiting factor.
-
Technological progress could bring abundance and thus help humanity free itself from the shackles of scarcity.
You can only see what you know. What is knowledge? How is it created? Iteration, error correction. Pareto optimality. Exploration vs. exploitation is a muscle worth training. The more knowledge the more you see. Wealth is the set of all achievable transformations. The more you see the more you know the more wealth.
Aaron Ames on “learning needs control”.
Safety is a learning problem. Formulating it as control is misleading and provokes illusion of control. Two advantages If regarded as learning problem. 1/2 Accept variance. 2/2 Focus on data and algorithms to approximate safety as search for Pareto frontier.
Book discussion “The Extraordinary life of Sam Hell” by Robert Dugoni. 1/3 Belief and Faith are human fundamentals. Evolutionary advantage. 2/3 Inclusion isn’t. Cost of inclusion to individual and society. 3/3 Author writes form the heart. Good. Narrative doesn’t flow as well as for example Damon Copperhead (similar novel). Good story.
Zhenyi Liu (Stanford)
-
Scene generation and camera and sensor calibration. See paper. The key here is generate synthetic scenes so cameras can be calibrated and ML systems tested.
-
Fog synthesizer. Generate fog. Create a fog dataset with different lighting conditions so the model can learn to detect object in foggy conditions and the model can also be tested against this dataset.
John Preskill presentation on applying ML for quantum systems prediction.
Algorithm learns how quantum system evolves. Paradox. How can seeing a few pages predict the book in an entangled system? What is a page? Preskill says doesn’t see application of quantum information processing to classical ML.
Nvidia Earnings Call. 1/3 New compute paradigm, generative not instructions. 2/3 Sovereign AI, arms race. 3/3 Cost leader per joule and dollar.
Creating, Using and Detecting Deepfakes
Hany Farid (UC Berkeley). Detect Deepfakes with geometry, vantage point. Byzantine Generals problem. Hardware-software solution to authenticity.
Scaling laws in LLMs, Code generating models
Loubna Ben Allal, Hugging Face. Chinchilla scaling laws. Optimize for tokens, data and parameter size. Smaller models better for inference. Starcoder. Code vs. LLM generally requires long context. No big difference between code and English LLM with regard to tokenizer.
Leslie Schoop, Princeton
How can AI help in material discovery? What material should be studied in the first place? Data, model. algorithm. Band theory. Quantum materials: Superconductors, Topological materials (electronics), Spintronic (data storage and processing), Quantum sensors and quantum computing components. Roald Hoffmann, learn the language of physics and AI. Deep Mind paper on material discovery.
James Analytis, Berkeley, When materials are in transition they exhibit quantum behavior. Study fluctuations due to Heisenberg for applications in spintronics with anti-ferromagents. Applying current.
Book discussion “Back to Blood” by Tom Wolfe. 1/2 Conclusion of “Painted Word”. The end of an area of theory laden observation. What’s real cannot be faked and vice versa. Fake news is an oxymoron. People form an option and then find the information to back it. 2/2 Real world journalism with narratives, characters and plots.
Extreme energy workshop. Fusion. 1/4 Large government project don’t work. 2/4 More iteration, error correction. 3/4 Incentivize actors to produce more results that are benefiting society. 4/4 Focus on education, teaching and research. Refrain from money, status and vanity. Essay.