"Not Even Wrong" Podcast
Investing in fundamentally new concepts and engineering practices with large impact.
Previous site
❤️Hani Goodarzi. UCSF.
DNA to RNA transcription can be seen as a convolutional neural net solution to an inference problem. Proteins are filters scanning the DNA sequence to trigger transcript into RNA and Protein. RNA is structure and sequence. Identifying structural information is challenging. DNA = sequence. Protein = structure. RNA = both.
Scientific discovery with ML. Tools. Software. GFlow. Jax. The key to scientific discovery is to solve the intractable inference problem with optimization techniques.
Edward Hu, Joshua Bengio. GFlow. Scientific discovery. Find optimal path. Large design space. Reward model is like a filter.
Hongwan Liu, Chicago.BBN, Big Bang Nucleosynthesis. JAX. How did elements arise after big bang. Parameters, model, use Monte Carlo to sample.
MLSys .Part
Ying Sheng,Databricks
Fairness in LLM serving. Center for Automated Reasoning.
Tri Dao, Princeton
Memory I/O is bottleneck. GPU memory. High bandwidth memory. HBM. Every token with every other token. Can you reduce that to use SRAM. Tiling and re-computation.
Dan Fu,Stanford
Architectures with better selection mechanisms for which token to process at wha time, you get better performance. SSM vs. transformer.
Kurt Keutzer, Berkley
Sold Deepscale to Tesla. Focus on efficient CNN inference. Karpathy said that data splits and oversampling are more important. Basically saying that data janitoring is more important than academic musings over models and architectures. Agent training. Where is the data going to come from?
MLSys. Part 1
Azalia Mirhoseini, Stanford
Data. Model. Software. Hardware
-
Constitunional AI.
-
Mixture of Experts. Gate data to specific expert
-
Hydragen paper. Prefix fine tuning.
-
Cats paper. Sparcity. Can we extract more sparcity from LLMs.
Xupeng Miao.Caltech
GaLore is an optimizer reducing memory requirements.
Optimizer states require more memory than the weights themselves. Reminds me of Thierry Tambe and eDRAM.
Edge Machine Learning efficiency, memory and compute constraints
Atila Orhan at Argmax talks about efficiency engineering for ML on edge devices. Reduce large models to smallest necessary set of weights and biases to maintain performance.
Ming Ding, Zhipu AI. Discussed paper on creating video generation at lower inference cost. Also discussed CogVLM, a open-source multimodal model. Goal is to align video with text. Both domains work well independently, but what if they’re supposed to work together. Long context windows. Use compression techniques to achieve more efficient inference on long context windows. Relay diffusion Paper.
Bootstrapping theory. Where is the line between knowledge and learning?
Bootstrapping theory in particle physics about how the strong force interacts with subatomic particles.
Dichotomy between knowledge and Sutton’s bitter lesson, which states that intelligence is best built with learning and compute because those scale.
Where is the line between knowledge and learning? Is there an optimal amount of knowledge a system should/can have. What about knowledge of learning, learning how to learn?
The bitter lesson by Richard Sutton
Inspired by a presentation at GTC
-
Reality is way to complex. Let’s not try to define it. Search for approximations.
-
Strive to develop methods that find those approximations.
-
Search and learning, not knowledge.
-
Machine that approximates reality.
-
It’s tempting to teach the machine to think the way we think we think.
-
Compute is part of the knowledge stack. Can that be introduced in the end to end learning stack? Loop.
-
Wealth creation is driven by compute and AI is driven by compute.
Tesla IR call 5/7/24
Elon says only invest in Tesla if you believe in the FSD and AI potential. IR must improve their communication about FSD and AI.
-
Transfer learning from FSD to Optimus. Palm-E has shown that robots can transfer learn. Does Tesla have same experience?
-
FSD rollout in China. Potential for higher adoption if priced/bundled appropriately.
-
FSD bundling in US and Europe. Algorithmic driving should in principle lower accident rate and thus lower insurance cost. Can this be transferred to the customer. Amazon Prime type model?
-
Why is DOJO not more competitive. Why pay Nvidia such higher margins when you have a proprietary workload to train with?
Discussing Essay:
Theory 2.0
-
Humanity see itself through the lens of science. We are not just changing the Zeitgeist with innovations in science. We are changing science through innovations in science. Loop.
-
Painted Word. Modernity. Picasso, Heisenberg, Einstein. Hofstaedter, Neural Nets. From Observation is theory laden to observation is AI laden.
-
Old: Scientist develops theory. New: Scientist designs pipeline for AI system to develop theory.
-
Similar ideas by Jonathan Hurst from Agility Robotics. Engineers design AI systems to solve technology problems.
-
Key two innovations of AI today are simulation and reinforcement learning.
-
Ultimate frontier of AI is when the AI system develops an “I” like in D. Hofstadter’s book. Imagine a robot dying for an idea.
How can foundation models add to autonomous vehicle stack?
Marco Pavone, Stanford
Foundation models for embodied decision making agents part 18.
Definition of foundation model: Trained on a wide range of data in self supervised way and can be adapted to various downstream tasks.
-
Simulation. Video generation from real videos.
-
Semantics. Training. Imbue driving videos with internet scale reasoning. Example, train vision model with temporal dynamics.
-
Architecture: Modular. End to End. End to End is possible because ability to differentiate across modules.
-
OOD detection. Use foundation model for OOD detection and once detected use foundation model to suggest action.
Learning-enabled Adaptation to Evolving Conditions for Robotics
Somrita Banerjee
Robots must be able to adapt to evolving conditions that are different from those seen during training or expected during deployment. Adaptive learning based on out of distribution data (OOD) and/or function. OOD is in the eye of the beholder. Distribution depends on what we know. Best way to avoid OOD is to train well. A robot with lots of OOD is not well trained. Transformers solved the OOD problem for text by parallelizing the training of all kinds of distributions of tokens. Robotics must go to simulation. Represent reality in sim and then train on a multitude of trajectories to reduce OOD risk. Algorithm minimizing OOD.
Dieter Fox: Where's RobotGPT?
Foundation models for embodied decision making agents part 17.
Sim based robot training pipeline.
-
Generate scenes for specific task
-
Generate tasks based on scene given, set up reward function design
-
Solve tasks, then do behavioral cloning with real robot on simulated task solutions
How to integrate transformers with robot models?
Two fundamental approaches to robot training through simulation:
-
Use Vision language model to represent state and then train robot on generated scenes to execute task.
-
Use actual robot data, tokenize actions and then train on. Bootstrap robot transformers from real robot data, generate more actions, state tuples and learn from them.
Paper discussion:
Multimodal transformers. Input multimodal. Output multimodal, like diffusion for example.
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning.
Discussing book “Girl in a Book” by Kim Gordon. Sonic Youth. 1/3 Life accomplished. Self respect driven by zero to one type energy. Good story because it’s authentic. 2/3 New York art scene in 1970-90. Convolution of forces, rock, pop, classic, visual, fashion. What makes a place vibrate with potential? 3/3 Stylistically impressive biography. Generous with her thoughts and life story without provoking voyerism. Kim is a poet with noise.
Sputnik FSD. Tesla China FSD engagement. 1/3 Sputnik moment for autonomously driving in US. Remove resistance of regulators by creating competition. 2/3 Inverse Manhattan project for FSD in China. Large scale deployment of FSD and experimenting with business models. 3/3 Monetize FSD. AB testing.
Robert Dyro, Stanford. Modeling autonomous driving. Moravec paradox. The hard things are easy and the easy things are hard. 1/2 Plan agent actions under assumption of rationality. Like efficient market hypothesis. Good baseline but not realistic. Add uncertainty through Monte Carlo simulation. and optimization across trajectories. 2/2 Counterfactual sampling from real world examples. Generate edge samples to train agents in simulation. Model predictive control vs Q-learning.
Vertical Integration - three case studies
Integrate if value chain is not productive enough.
-
Recover Rare Earth Metals from Coal ash. Tesla should be investing in such upstream activities. When you have a car that drives 5-10x more you want cost of energy and powertrain/battery to last longer.
-
AI training. Two major problems. Parallelism and scarcity. Nvidia solved this problem for FSD training. But multimodal training for robots could be an opportunity for vertical integration.
-
Semantic maps. Mapping has been solved for location and distance but not for semantic understanding of environment.
Sara Aronowitz, Uni Toronto
In formal theories of decision-making, preferences are given. Learning and preference formation is memory dependent. Memory is path dependent. That’s why it can appear inconstant. It’s actually not.
“I argue that the core function of any memory system is to support accurate and relevant retrieval."
Peter Caradonna, Caltech
Measure preference intensity, consistently across models and use cases. Is there a canonical way to measure how much more you like a versus b? Introduce nummeraire, money. Use arbitrage to separate logic from a-logic. Using humans to teach AI with RLHF. Make sure humans have diverse experiences, which leads to more diverse preference pairs. More information.
Mixture of Experts (MoE) paradigm and the Switch Transformer
Scaling matters. Power law. Compute, parameters, both scaling works. How can models be scaled and kept efficient? Models are more sample efficient because experts can divide work and take more advantage of data. Switch transformer. Turn off weights you don’t need. MoE. Activate experts that you need. One big model with trillions of parameters. Don’t use all of them at once at inference time. When you surf, you don’t use programming skills. Similar to neuromorphic compute.
Douglas Hofstadter, "I am a strange Loop”.
DNA is self referential. The I is self referential. “I need by eyes to see and I need my I to be.” How can DNA create itself? How can the I define itself. Gödel. I-loop created by self referential systems that turn agent into an I. Collect information and act upon it. Then add self referential loop. Agents interact with environment through preferences. Some of those a deterministic, some fall into a self referential loop (cyclicality) and can lead to inconsistent behavior. Is this the source of human creativity? What is the purpose of the I? According to Hofstaedter, the "I" is not just a static concept but rather a dynamic process that emerges from complex interactions within the brain. Meta reward function that optimizes for energy. This could be the path towards self referential robots that have Robinson Crusoe abilities.
Tesla Q1 2024
Definition of good earnings call is = it makes you want to work. Shift in direction from EV manufacturing to Autonomous EV (AEV) manufacturing. Companies change direction when technology changes. Innovator’s Dilemma. EV manufacturing means battery, powertrain, scaling rapidly. AEV manufacturing means build a car that can drive without human and last 5X longer. “If you want to invest in Tesla, test drive FSD”.
Arch of technology. Vision 2012, LLMs 2023 enable semantic understaffing of environment. Huge for robotics. Saving per mile, per minute. Do more with less. Value creation. Next step, native AEV products. Netflix keeper test for employees, investors. Is Tesla still the right investment for the job? Hire people that are willing to switch, subordinate themselves to technology. New tech requires new workflows.
ML Data attribution. Appel at AI community. Stop tinkering with the truth. Safety is not to misrepresent the truth. Tinkering with the truth is the path to an Orwellian nightmare. LLMs are a compact representation of knowledge on the internet. Tinkering with that representation is changing it to the liking of a few privileged sensors. This is the antidote of safety.
Ferens Krausz, Max Planck. Attosecond Physics. Design laser flashes that are pulsed in short time intervals. Shed light on electrons. The finer we see, the more we learn.
Kathy Galloway, MIT. Micro/nanoscale reactive transport toward decarbonization
Integrating synthetic circuitry into larger transcriptional networks to mediate predictable cellular behaviors. Stochastic nature of transcription is challenge.
Dr. Wen Song (UT Austin). She studies the microstructure of coal ash and then designs methods to extract REEs from there. Fluid-Fluid and Fluid-solid mechanics are discussed. The key to this work is studying the microstructure of coal ash with physical models and then derive fluid mechanics from those models to design extraction techniques.
Sergey Levine, Data-Driven RL in Robotics,
Represent real world. Unsupervised pre-training. LLMs. VLMs. Do these methods encode knowledge? Where does the knowledge come from? Represents people who put it in the internet. Find optimal trajectories with offline RL.
Alex Havrilla on TWIML podcast
Fine tuning LLMs with RL. Paper on how to teach LLMs with RL. Surprisingly most models, PPO, DPO, RLHF deliver similar performances. Why? They are mostly deterministic. Not taking advantage of exploration. How can exploration be improved? Is there an Alpha Go Zero approach to LLM reasoning? Self play. RL.
Foundation models for embodied decision making agents part 16.
Ken Goldberg talk at Stanford. Discussing end-to-end and engineering in AI.
Discussing paper MOKA leveraging VLM, LLM. How much priors do we need?
MOKA paper references Hu paper, where they use VLMs to execute robot actions.
Does knowledge require priors? Reduce search space. Exploration? RL.
Does an FSD car know physics even though it’s not trained on physics priors?
Christopher Peacock, Columbia University
A priori truth, which is independent of experience and axioms is possible. I want to invoke Gödel. A priori truths within a given system exist but they can sometimes not be proven (incompleteness). New truths can be discovered. How can new knowledge emerge from existing axioms? How is Move 37 possible? Mathematics is not a priori. It’s a formal system of symbols and rules.
Robots are proof that there is an a priori. We just don’t know it.
Reaction to Dave Lee Podcast: FSD v12: Tesla's Autonomous Driving Game-Changer w/ James Douma
-
Tesla has advantage in FSD training because they built a database of human driving data.
-
Implicit versus explicit heuristics.
-
FSD native products and services will propel Tesla to a real world AI company on Wall Street.
-
Optimus. Advantage for Tesla. Transfer learning from FSD for navigation. Speed. Iteration for low cost hardware. No data advantage. Iterate on engineering, when software is ready, hardware ready. Aloha,
-
Pipeline for real world robotics.
Foundation models for embodied decision making agents part 14.
Nikolay Atanasov, Elements of Generalizable Mobile Robot Autonomy
UCSD
-
Robot Model. Using Hamiltonian dynamics for modeling speed, torque, force etc. reduces search space.
-
Environment model. Represent the environment. Semantics landmarks.
-
Task model. LLM uses language description of task. Translate into automaton. Execute task. In this paper the authors suggest an LLM base approach to scene graphs.
Discussing Stanford Robotics workshop. 1/2 Prof. Wu talks about representing physics in virtual space. Transformer with RL and zero short learning. 2/2 Grace Gao on neural HD maps. Maps 1.0 is location. Maps 2.0 semantics. Multimodal transformer with LLM and visual representation.
Tesla transitioning from mass producer of electric cars to transport on demand. AWS business model. Robotaxi. Large distributed infrastructure based on real world AI, compute and connectivity. What is transport going to look like? Tesla is in the process of defining it. Turmoil in C-suite because problem set company is facing changes.
-
Drew Baglino leaving Tesla. Keeper test (Netflix). Tesla morphing into software defined hardware company with focus on AI and distributed computing. Battery and powertrain still important but not key to future of Tesla. More software and AI driven executive branch because key problems are in this area.
-
Interest rates up. Bad for car loans. Pressure on car market.
-
Skepticism amongst scientists towards Elon Musk. Must be ground in science, not in status preservation.
Events 4/8-4/12 Part 4
1. Multi-Sensory Neural Objects: Modeling, Inference, and Applications in Robotics. ❤️ Jiajun Wu of Stanford University
What makes an object an object. Unsupervised segmentation and 3D representation of objects for simulated robot learning.
This paper summarizes Wu’s research goal:
My research goal is to build machines that see, interact with, and reason about the physical world just like humans.Inverse real to sim problem. The Galileo model, proposed by Jiajun Wu and colleagues, is a generative model for solving problems of physical scene understanding from real-world videos and images.Can foundation models solve some of the physics problems in robotics? Wu writes about challenges: 1/5 Data Scarcity. 2/5 High Variability. 3/5 Uncertainty Quantification. 4/5 Safety Evaluation. 5/5 Real-Time Performance
2. Spatially-Selective Lensing for VR Displays. Summary, Aswin Sankaranarayanan (CMU).
Aswin developed a lens that can display multifocal images. You take a 2D picture and you have 3D vision because you can focus on different objects simultaneously. They achieve this by building a software and/or electronically defined Lohmann Lens, which is a system that creates a focus-tunable lens by translating two cubic phase plates relative to each other.
Events 4/8-4/12 Part 3
1. Open AI presentation Jason Wei
Multitask. LLM models learn several tasks one by one. Scaling. Why does scaling work? More compute generates more learning.
Hyung Won Chung. Recipe for AI. Develop progressively more general methods with weaker model assumptions. Decoder only.
2. Aaron Dollar - “Mechanical Intelligence” in Robotic Manipulation
Mechanical solutions to robotics problem. Example grasping. Use mechanical feedback.
3. Kyujin Cho Seoul National University. Title: Nature-inspired designs for innovating robots: grippers, wearable robots, and mobile robots
Events 4/8-4/12 Part 2
Sasha Newton
UC Riverside
Kant on truth. There higher, transcendental truth and there is empirical truth. Empirical truth is when our cognition matches nature. Knowledge is when lots of people agree with that truth. Higher truth is not directly accessible.
Observer duality. We can only see what we observe. There is more truth behind what we can see. We might be able to conjecture without observation (Einstein, Deutsch). Observations are theory laden. What we see is what we think.
Popper on Kant. There is no a priori truth. But he agrees that we must conjecture.
AI - hyper conjecture. Machines conjecture faster. Are these AI system uncovering new truth or just more truth? Kant would say, more truth, Popper would say, new truths.
Heisenberg. Uncertainty. If there is a minimal amount of certainty about a relative observation than there must be a minimal amount of truth. But what happens beyond that. Is that still truth?
Events 4/8-4/12 Part 1
Karen Leung
University of Washington
Moravec paradox in self driving. Hard things are easy and easy things are hard. Why? Traffic rules. More agents constrain themselves. Reduce degrees of freedom.
How to develop a data driven, flexible and robust analytical framework for safety in robot interaction.
-
Quantify safety with Hamilton-Jacobi reachability. Augment translation matrix with HJ variables.
-
Parametrize HJ so that it become learnable.
-
Train. What is good data.
The key to Karen’s work is how to develop a priori techniques to gage the safety of a robot.
Compound effect of technology. Innovation stack. When innovations build on each other. Scaling and main driver of wealth creation.
-
Robot technology will get boost when 3D-fication of 2D images gets solved. Video and images can be used to train robots in simulation. Self driving car doesn’t touch things, doesn’t need 3D-fication of images. Robots do.
-
Self driving car got boost and could be solved when vision was solved in the early 2010s.
-
Compound value of technological innovation is like biological evolution. How do big changes in evolution happen? Species conquer new territory with techniques that have been dormant or not so relevant in previous biotope. See Neil Shubin and the transition from water to land. Same with technology. Some groundwork must be done before big leaps can happen, like solving 2D-3D in image to enable better simulation for robotics.
Foundation models for embodied decision making agents Part 13 Jitendra Malik, "When will we have intelligent robots?” 1/4 Bang for the buck in RL is adaptation, not just search action space. 2/4 Key concept for robotics is 3D-fying 2D images so robots can learn from video data. Compound value of technology. Solve 3D-fying 2D - train robots on video data - train robots on robot data. 3/4 Transformer architecture is key to robotics. Learn from action-state tuples and model robotics problem as next token prediction. Simulation. Once 3D-fying solved, simulation for robots will get boost, similar to when convolutional neural nets enabled vision and self driving cars. 4/4 Compound value of technology. Sometimes you have to wait for other components to kick in. Reminds me of the water-to-land argument by Neil Shubin. He argues that some ingredients were in place and then used later when they became useful. Same here, 3D-fying must first be solved to deliver robots at scale.
State of play. 1/2 Tesla FSD 12.3 is launching a new vector of compound growth. FSD native products and services. Foundry for multimodal transformer stack. Achieve the NotNot - Can't afford not to have it. 2/2 US inflation elevated due to high deficits and debt. New government and fiscal discipline can revert this.
Events 4/1 - 4/5
-
❤️Kaiming He. Deep learning is about data representation. Compression, abstraction, conceptualization. Loop in forward and backward propagation. LeNet. Convolution, pooling, fully connected. Alex Net, GPU. Data and model parallel. ResNet. Much deeper neural nets are good. Control for overfitting, gradient collapse with normalization and regularization.
-
Robot Learning in the Era of Large Pre-trained Models. Dorsa Sadigh, Foundation models. Pre-train representation from that data and then adapt to different tasks. Meta-learning at scale. What is the pre-training objective? Vision. Masked auto-encoding. This work builds on Kaiming He’s paper on Masked auto-encoders. What is good data? Novel and high success. Kick out bad data. What does Tesla do with bad drivers? Reward design. Use LLMs to enhance RL.
-
Learning to See the World in 3D. Ayush Tewari (MIT). Looking at 2D image and reason about 3D. Inverse graphics problem deals with the task of inferring 3D structure of a scene or object from its 2D image.
Tesla news on M2 and Robotaxi, Kaiming He, Representation and Douglas Hofstadter.
-
Tesla apparently scaling back lower cost Model 2 to focus on robotaxi. Dynamic companies often change course because markets change or because their technology enables new avenues. The latter. FSD v12 has reached level 4 autonomy. Positive surprise. Now the risk shifts from technology to monetization and regulation. Low cost M2 still important for Wright’s Law.
-
Kaiming He talk on representation as key to deep learning. Take sensory input like pixels and find optimal representation. Representation depends on task. Goal is what a system eventually ends up doing. Contrastive learning requires clear features, for self driving occupancy enough. Fundamental problem of deep learning from data is to manage looping (forward and backward prop). One input can cause havoc in the network. That’s why AI researchers developed ResNets, Regularization, Normalization and methods for efficient initialization. It’s about patterns and relationships, not absolute values.
-
Douglas Hofstaedter in his book “I am a strange loop” formulates a theory of intelligence based on epiphenomena that are extracted from raw data. Intelligence is efficient representation.
Discussing “Picasso and the Painting That Shocked the World” by Miles Unger. Revolution in consciousness around turn of 20th century. Picasso, Einstein and Heisenberg. Seeing art through the brain. Realty is In the eye of the beholder, like in quantum physics. The painting "Les Demoiselles d’Avignon" defines a new path in consciousness. Process of creation. Today we are facing another paradigm shift in consciousness driven by AI. Hyper conjecture. Truth is probabilistic discovery. AI expands search space. The closest we have to the multiverse.
Tesla Q12024 Production and Delivery Numbers
1/6 Below capacity and expectations. Build up in inventory. 2/6 Model 3 Osborne effect in US. 3/6 Production and shipping disruptions in Europe and Middle East. 4/6 Competition. Most important factors are cost and value. Tesla is world leader in those categories and will withstand competitive onslaught. 5/6 FSD adoption expected to increase because it’s very good. Value creation thourgh AI shifting from technological problem to regulator and political. FSD is nail in the coffin on legacy auto and they will fight it. 6/6 Musk’s political polarization not a factor because polarization goes both ways.
Structure and AI
Animesh Garg of Georgia Tech/NVIDIA. How much structure is there? How much do we need? Explicit vs. implicit structure. This talk argues that there is lots of structure, or low hanging fruit in model design. How do machines see the world? Break things up and let machines figure out how to assemble them.
The key in this work is to take real images and model them with implicit geometries (i.e. end to end learned). Use those models to simulate different scenarios in a digital twin fashion and thus enable a self driving car or robot to learn end to end in sim.
The truck that shocked the world
Art is not an esthetic endeavor. It’s channeling the underlying forces of nature. From “The Painting that Shocked the World” by Miles Unger. Cybertruck is radical design. Revolution in consciousness, like “Les Demoiselles d’Avignon” by Picasso. Radical means it’s embedded in breakthroughs in science and technology. Demoiselles roots in Einstein and Heisenberg. Cybertruck reveals new scientific paradigm driven by AI. Line between human-machine blurred. Human intelligence and AI merging. Cyberbruck is like “Les Demoiselles”, a new sign post in humanity. Galileo (Observe and speak math). Newton (Make math useful for physics). Einstein (Spacetime), Heisenberg (Quantum Uncertainty). AI machine is human and human is machine. Lines are blurred.
Jimmy Ba. X.ai
Age of Image Net vs. Age of LLMs after ChatGPT. What is different this time?
Models with more parameters don’t necessarily perform better. Models with more compute do perform better.Related to Richard Sutton’s essay, “The Bitter Lesson”. Compute and search path way to intelligence.
APE. Automatic Prompt Engineer. Define task and have the LLM improve itself through self prompting. Why think step by step? Why is chain of thought a useful thing?
Explainability through Chain of thoughts.
Foundation Models for embodied decision making agents. Episode 11
Sergei Levine talk on Reinforcement Learning with Large Datasets
Why does RL help improve outcomes from data that is generated by humans. Because machines have their own psychology and come up with Move 37 type solutions if trained in a hermetic, self-referential learning environment like RL. Use offline RL to scale.
-
Offline RL fundamentals
-
Applications to robotic foundation models. Large models that train robots for variety of tasks. Example ViNT, Q-transformer.
-
RL with generative models. How can RL be used to improve diffusion models? Dolphin riding a bike.
-
Offline RL with LLMs. Better than RLHF? Is it? Can we rely on machine psychology?
-
Richard Sutton essay: “The Bitter Lesson”. Scale and search better than explicit human knowledge. But where is the design still relevant? Bootstrap a machine that can self learn hermetically in self referential and scalable way. What about alignment?