Theo's Substack
Theo Jaffee Podcast
#10: Liron Shapira

#10: Liron Shapira

AI doom, FOOM, rationalism, and crypto

Liron Shapira is an entrepreneur, angel investor, and CEO of counseling startup Relationship Hero. He’s also a rationalist, advisor for the Machine Intelligence Research Institute and Center for Applied Rationality, and a consistently candid AI doom pointer-outer.

TJP Links


Introduction (0:00)

Theo: Welcome back to episode 10 of the Theo Jaffee Podcast. Today I had the pleasure of interviewing Liron Shapira. By day, Liron is an entrepreneur, angel investor, and the CEO of counseling startup Relationship Hero. By night, Liron is deeply involved in the rationalist movement and is one of Twitter’s most prominent advocates for AI safety. As usual, we go in depth on various aspects of the AI doom debate: where he agrees and disagrees with Eliezer Yudkowsky, the various AI and non-AI risks that humanity faces, the differences between human and ASI intelligences, and his critique of Quintin Pope and Nora Belrose’s AI Optimism movement. We also talk about how a high probability of doom impacts his personal life, his background in the rationality community, and his skeptical views on the crypto industry. This is the Theo Jaffee Podcast, thank you for listening, and now, here’s Liron Shapira.

Non-AI x-risks (0:53)

Theo: Hi, welcome back to episode 10, tenth episode of the Theo Jaffee podcast. Today, we're here with Liron Shapira.

Liron: Theo Jaffee, I'm a big fan, I've been listening to the catalog.

Theo: Glad to hear it. So let's get into some of our first questions. We know that you're very interested in and worried about existential AI risk. But how worried are you about non-existential AI risks, especially because more and more powerful AIs are drawing near. We saw a demo just a day or two ago of text to video that looked decent for the first time. So non-existential risk, like jobs, what if we end up in a future with aligned super intelligence, but humans lose agency or meaning, just anything in that category.

Liron: So yeah, when I think about the non AI existential risk, I'm not super worried, but a couple things come to mind. Nuclear risk and bio risk would be the top two, I think below AI existential risk. I think nuclear risk is profoundly underrated. It's been described as something like 1% per year. Maybe if you look at the rest of the century as a whole, I might put it at like 15% chance of doom, maybe 20, right? Because maybe the risks are correlated. So it's not like independent events of 1% per year. But I think nuclear risk is underrated. And I know that people love to say, oh, my God, people are overblowing nuclear risk. It gave us nuclear energy, focus on the nuclear energy, nuclear energy is safe. And they're right that nuclear energy is safe. But that doesn't justify how risky nuclear explosions are. We still have these arsenals, okay? Let's not forget. And like, yeah, it's great that nuclear power plants are good power plants. But nuclear risk is still sitting there, these 50 megaton devices are still sitting there, right? And there's all these incidents where they almost went off. So I just think it's underrated. And maybe I would be a big nuclear doomer. But it's just hard for me to focus on that kind of thing when I think that the AI doom probability is 10 to 100 times greater. So I'm like, okay, great. Put that aside. That's not my cause. But that might be my runner up cause.

AI non-x-risks (3:00)

Theo: Yeah, I meant more like, not existential risks that are not AI, but AI risks that are not existential.

Liron: I gotcha. Okay, that's an important distinction. I tend not to be concerned about the AI risks that aren't existential, unless they're near existential, right? So if we're talking about, oh, humanity is all like slaves to the AI, but we're still kept alive with morphine. I guess I'm pretty worried about that. Well, I just think that's not plausible. But I would consider that pretty bad. But then if you go down to social media is gonna be more addictive, then I become less concerned.

Theo: Do you think s-risks are plausible?

Liron: I do think that s-risks are plausible, right? So it's the idea, suffering risks for the listeners, it's the idea that we're creating these moral agents, moral persons, right? So like within the AI, maybe it's just trying to simulate what a human would say. But that simulation is a person or has moral value. And it's hard to prove that there's not a moral person inside of these AIs. I mean, presumably there's not yet because they're not quite powerful enough. But as they grow more powerful, it's very plausible to me that they can have a consciousness, right, within the inscrutable matrices, and they can have somebody that has rights or that you don't want to harm. So that's very plausible. And we're just confused about consciousness, we're confused about morality beyond humans and animals. So I think s-risks are very plausible. And then, you know, turning the tables, that's like us causing harm to the AI, but then the AI could also cause harm to us or to copies of us. So I definitely think we could enter a hell, where we're all getting tortured for trillions of years. Like, I think that's a plausible outcome. It's just not quite my mainline outcome, right? My mainline outcome is we just kind of all get swept away. And we just get like paperclips or something that happens to not be conscious and not be interesting. That's kind of my default.

Theo: By plausible, like, how likely do you think that is?

Liron: Hmm, like, how likely do I think an s-risk universe is? I don't know, probably less than 10% like ballpark, I'd say more than 1%. That's like a very rough ballpark, right? So I don't, I definitely don't want to write it off. It's just that if we're even talking about that, it's kind of like we've already gone pretty far where I'm trying to push the discussion right now, right? It's like, that's the discussion I want to have, I would love to be like, hey, are we all going to just die unceremoniously and have the universe burn itself out with no consciousness? Or is there also going to be tortured consciousness, right? If that was the dichotomy, I'd be like, great, let's have that discussion.

p(doom) (5:21)

Theo: Well, speaking of probabilities, the notion of p(doom) has been dumped upon a lot recently, including the clip you posted of my podcast where I asked Zvi about it. That's right. You got a good dunking there for sure. Yeah. And so people say like, it's not rigorous. And it's basically, even someone as prominent as David Deutsch said, basically, like, oh, yeah, the steps to getting a p(doom) are like, pick a number between zero and one, not too far or not too close to either of those bounds, and then you're done. So first of all, like, what is your p(doom)? If you have one? And second of all, like, how rigorous do you think your methods of getting it are?

Liron: So my p(doom) is 50% by 2040, which is like Zvi said, like Jan Leike said, a ballpark figure. So you can also call it 10 to 90. And this is when the dunks come out, right, the knives out.People often question, "How is 50 the same as 10 to 90?" Just to give a basic explanation, if you need a single probability for the purpose of decision making, you can go with 50% by 2040. That's your single probability. Why give a range? One way to explain a range is that it's the variance of a Monte Carlo simulation of different mental models about likely possibilities that I might have.

For example, there's a possibility where the world gets its act together and coordinates to stop AI. That's one mental model. And there's a totally different mental model, where we just accelerate as hard as we can. And then the AI fooms. There are so many different mental models that are all feeding into this one probability. It's crazy to compress it down to one dimension. And yet you have no choice. Because when you make decisions, when you do expected utility, you have to plug in a probability number. There's only one future. So all you can do is weight things that could have influenced the possible future.

That's why I say 10 to 90. That's why Jan Leike says 10 to 90. And then, people have so many objections. They're like, "Where did you get the number from?" For that, I'd say, think about the ballpark. Think about the order of magnitude. If I say, "Hey, 50.0 or 53.25," then it's like, "Whoa, okay, I'm making up a number." But if I come at it from the other way, and I'm like, "Hey, I bet the probability is a lot higher than 0.01%," suddenly, I'm saying something pretty obvious. Because you can imagine so many scenarios that are plausible, like maybe foom is real. Don't you think there's at least a 0.01% chance that foom is real?

So if I slide all the way back to 0.01%, at some point, you start subjectively telling me, "You're obviously underestimating this." So, 50%, suddenly, I'm like an idiot pulling numbers out of my rear, 0.01%, okay, I'm obviously underestimating. So if you just become more continuous with how you react to what I'm saying, there's going to be some happy medium where I'm saying something when you're like, "Okay, this seems vague, this seems rough, yet, you can't do better. And you have to give a number."

Theo: One exercise in p(doom) is we've had atomic bombs for like 80 years now. And you could say, the probability of nuclear doom in any given year was what, 1% to 5%, something like that. And yet we are still here. And it seems quite unlikely, not totally unlikely, but quite unlikely that we'll be vaporized by nukes within the next few years. So could it be possible that your intuitions for p(doom) might be higher than it would actually be in real life, especially over long time periods with robust systems like civilization?

Liron: I mean, so you're using the example of we've had nukes for 80 years, and let's say that there was a 1% chance that they could annihilate, more than 10%, or even 50% of humanity. So every year, we're rolling the dice, and we only have a 99% chance to survive, 1% chance to die. So it looks like 99% to the power of 80 is 44%. So surviving a century is only like a coin flip, right? So I'm pretty content to be like, "Okay, we got lucky on a coin flip." So, I don't think that my model of 1% of your nuclear risk is invalidated.

And especially when you look at where the model comes from, like, you almost have these things go off, right? You have Cuban Missile Crisis, you have Petrov, you have safety checks on like a test flight over Spain, three out of four of the safety things failing, like there's there's near misses.

Theo: When you talk about 10-90% p(doom), you mentioned like, "Oh, once you get into too low numbers, you're obviously underestimating it." So, do you think of 99.5%, which is Eliezer's number of p(doom) as like, "Well, you're obviously overestimating it," just like you would with a 0.5%?

Liron: With Eliezer, I think that he would probably agree with my perspective, which is that 99.5% is kind of the on-model probability. So, if you understand what Eliezer does about the relevant theory, optimization processes, computational processes, he's an expert at a lot of the relevant theories. And he's like, “Based on my understanding, what AI labs are trying to build is something like a perpetual motion machine. And so my model just doesn't say that this can proceed with a significant probability of success.” It's kind of like, hey, a bunch of people are building a rocket, you know, the first the first rocket that anybody's ever built is going to try to orbit the earth, there's just a very low probability of success on model. But I think Eliezer would agree with my own claim, which is like, okay, but you never know unknown unknowns, like, there's probably like a 1% chance that it'll be revealed to be true what a few people are accusing Eliezer of that he's completely clueless. And his rationality makes no sense, and his probability makes no sense, right. And like, that could be revealed that we're all just like clueless people, right. And some people are urging us to see that reality already, right. And just for that, you have to give a one or 2% chance just to that, right. So there's the off model probabilities that I think Eliezer would admit are like worth mixing in a little bit.

Theo: You said 10 to 90% 50% by 2040. What about like 2100? Is it significantly higher or like the same or lower?

Liron: I think it's highly correlated. So I think if a foom is going to happen, it'll slightly more likely probably happen before 2040. I think if you go to let's say 2060, then I'd probably push it up to, like, I don't know, 60%. It's hard to push it beyond 60%. Because when I quote the figure, I give myself a lot more just like unknown unknowns. Like I'm clueless. I'm not as confident in what I'm saying in general as Eliezer is, which I think he has a right to be. He's a master of a lot more relevant theory than I am. So I don't think it goes that much beyond 50%. Because I start getting into the "I don't know what I'm talking about" range of things. But you can definitely push it to 60, maybe 70, if you go all the way to 2060. When you go past 2060, at that point, it's like, "Well, what's going on? Why hasn't it foomed yet?" So at that point, it starts undermining my assumptions. So it doesn't necessarily get higher, because it also gets lower. I don't really know what happens to it.

Liron vs. Eliezer (12:18)

Theo: So you respect Eliezer a lot, and you think that he knows much more about this stuff than you do. But your opinion is different. So why is that? Is it just because you're less confident in his assumptions? And if so, which assumptions are you less confident on?

Liron: I think that Eliezer's model makes a lot of sense. It's just more like, whenever I question him about little things I don't understand, like "wait, so RLHF breaks down when exactly?" and I've had a few of these conversations with him. He always has really good answers. But I can also tell that I have an undergraduate level understanding, and he has a more sophisticated understanding. I expect that I'm more likely to update toward Eliezer than away from Eliezer. But I guess I'm not comfortable making the full update yet, even though there are some principles or rationalists where you're supposed to update all the way. I have some uncertainty.

The thing is, I don't think that we disagree that much. I think most people who are in the "it looks like we're gonna die" camp, which I am too, don't have that fundamental of a distinction between people going like, "hey, there's 95%" and people going like, "hey, there's 50 plus". I think we're kind of in the same ballpark, which is why when people come and tell me like, "hey, my probability is 10%", like Vitalik just said, I'm like, "okay, great". I don't want to nitpick 10 versus 50. I just want you to see 10, and I'm happy to just let you stay at 10. I don't think you have to come to 50. And you don't have to, because I do think that a lot of what I believe about reading LessWrong is just intuitions that are salient to me, but I understand that they may not always be right, and other people can weigh up their intuitions differently. I don't think that they're making a big methodological mistake. I think it's okay for them to stick with their probabilities until they observe more evidence.

Theo: Do you have any concrete disagreements with Eliezer?

Liron: That's a good question. I don't know if I do. We always have stylistic differences, but when it comes to the matter of AI doom and rationality, I think there are nitpicks. There's an article he wrote a long time ago, where he thinks sometimes you shouldn't use probabilities in certain circumstances. That was kind of controversial. And somebody's like, "no, just use probabilities". And I don't know where I come down on that. Eliezer famously says that he thinks that a lot of animals just aren't conscious. He seems pretty confident that dogs definitely have no consciousness. And I'm like, "I don't know, they seem like they're kind of conscious intuitively". So on the edges, on the fringes, I do think that I start not following him all the way.

But on the AI doom core argument, I do pretty much buy it all. I think it makes a lot of sense, And I'm definitely somebody who I'm like a good target audience for his writing, because I do think that it's really good. I think it's still underrated. And I noticed a little bit of myself in it, where sometimes I understand something well. So like, I kind of know what it feels like to understand certain technical topics well. And then I read Eliezer. And I'm like, wow, well, he understands it even better. And I thought I understood it well. But he's pointing out some stuff that is actually deeper than my own understanding of a topic that I thought I understood well. So I feel like I have like a good viewpoint to understand like the degree to which this guy knows what he's talking about in a lot of these different articles that he's published.

Why might doom not happen? (15:42)

Theo: If you did eventually come to the conclusion that AI risk is less likely than you thought, why do you think that would be? Or do you just not know?

Liron: That's a good question. It's kind of similar to the question of just like, "you know, just imagine doing a post mortem or like a post living, right of like, hey, it's the year 2060. We're all alive. So how do you condition on that? What mental model do you get?" One easy answer is just like, AI progress turned out to be a really long marathon to get to superintelligence. So even though it kind of feels like we're speeding to superintelligence, and Elon Musk is like, "yeah, we're gonna have AGI in three years", and even OpenAI is like, "yeah, we might have a corporation this decade that's better than a human corporation that's run by AI". So even though it feels like we're speeding to AGI, and Kurzweil a long time ago predicted, I think, like 2029. Maybe it's not. Maybe it's 2100, maybe it's 3000. So that would be an easy answer to why we're not doomed yet, because it's just like everything goes slow. Maybe it goes slow enough so that we can do alignment research, right? If somebody just convinced me, look how slow it's going, right? And I know Sam Altman said something about, we're bottlenecked on data center scale. My reaction was, you really don't know that. We definitely could suddenly find ourselves with a bigger hardware overhang than we realize, and one data center could be plenty. But if Sam Altman was spot on, and we're bottlenecked on data center scale, and we have to scale it up like 1000 times, ideally, a million times, that would be a straightforward way to convince me that we're not doomed for a couple decades.

Elon Musk and AGI (17:12)

Theo: Well, Elon said three years, but we all know about his record of forecasting stuff.

Liron: It's not great. I don't think it's terrible. But it's definitely not perfect. Rob Bensinger posted Elon's record where I think in 2014, he said that we'll have it by 2019. So, you can't just automatically assume that Elon's exact forecast is right. I agree with that.

Theo: Well, he tends to be right about stuff in the long term, it just takes longer than he says it will, like self-driving cars, how he's predicted full self-driving next year, every year for the last 10 years.

Liron: No, he has. And it's kind of funny, right? A lot of times people catch him, they catch him exaggerating, or they catch him being way off. And it's like, okay, I'm starting to think this guy is not trustworthy. But then at the same time, he launches Starship and lands the rockets. And I'm like, man, there's a good enough distribution of miracles mixed with, okay, this is kind of BS. But this is a legit miracle that overall, I'm pretty bullish on. But then of course, there was the time when he started OpenAI and shortened the timeline by a few years, which Eliezer said, I think he has a good point, kind of overshadows anything else Elon Musk has ever done to stoke the AI arms race. In the end, and by the end, I mean potentially in a few years, that is the single biggest impact that he's done, arguably.

Theo: What about xAI? Do you think that's made it worse?

Liron: So far, it just seems like they're not moving the convex hole of what's possible, right? So, until they get there, I'm sure they're trying their fastest to get there. If they start releasing something that's like GPT-5 equivalent before GPT-5, then I'll be like, damn it, xAI. Why does Elon have to keep making things worse? But for now, I guess the question remains of whether Elon's 20% project is going to be competitive with Sam Altman and Dario's number one project? It's probably not going to make things that much worse. It's hard to say, right? We got to watch it.

Theo: Would Elon just drop a GPT-5 model in the world? He seems to be far more concerned about x-risk than maybe any other major AI lab leader.

Liron: So Elon gets massive points for, as early as the 2015 conference, coming in there being like, hey, I'm just a rich billionaire with a ton of credibility outside this field. And I think AI risk is indeed very dangerous, right? Like Bostrom has a point and he gets massive rationality points for saying that. Unfortunately, a lot of the things he said about AI recently are kind of ridiculous, right? Like when he talks about, I'm going to make a TruthGPT, I'm going to make a GPT that's not woke. I mean, I guess those are valid considerations in terms of the next couple years, mundane utility, fine. But when he says stuff like, I think AI is going to be nice to humans, because humans are interesting. It's like, okay, Elon, come on, man, you have Geoff Hinton, you're talking to these luminaries. And they should be disabusing you of these kinds of notions, right? The idea that humans are anywhere near the optimum for interestingness. And so that's going to be some kind of equilibrium. It's like, why are you publicly posting this stuff? It's like, the fate of the world is largely in your hands, Elon. And that is not a plausible theory.

Alignment vs. Governance (20:24)

Theo: So there's alignment research. And then there's governance research. And it seems like the default political plan for rationalist, decel, doomers, whatever you want to call it, slightly pejorative, but you know, people who are concerned about x-risk, is slow down AI and give the authority to build AI either to nobody, or to a trusted group of people. So do you worry that this increases centralization risk a lot?

Liron: Yeah, for sure. My position is that the actual constructive doomer plan is fraught with peril, right? It's a tough plan. The ideal would be something like a trusted Manhattan project, which seems unthinkable in today's environment. But if we really could get together the scientists, right and have some level of trust, and common purpose, the way we had in the Manhattan project, that may be the single best setup that gives us a chance as long as all of those scientists are top tier, are Nobel Prize winning physicists, or their students or whatever, and people who just appreciate what we're up against, and are taking it seriously the same way they took the nuclear bomb seriously. I do think we would have a chance to win the race between capabilities and alignment. But of course today, it's so unpalatable because people don't realize we're in a war, they don't realize that the enemy is unaligned AI. It just seems like such an impedance mismatch, what are you talking about Manhattan project, but short of that, I just think time is running out. We keep slipping farther and farther from the possibility of a good outcome. I think we're between a rock and a hard place, because you can give a million criticisms to the doomer suggestion of let's centralize everything in a Manhattan project. I agree, that sucks. But the alternative is worse. So many people are saying, you have to take it as an assumption that you have to run things for profit and China is going to compete with you like these things are inviolable axioms that you have to start with. And I'm asking, can I get an inviolable axiom that AI is going to kill us because it's a rock and a hard place. They're both hard situations. I just think that the AI killing us one is even harder, and we have to deal with it.

Scott Alexander lowering p(doom) (22:32)

Theo: So Scott Alexander recently published an update of his p(doom) from 33% to 20% based on super forecasters and the world at large thinking that AI risk is not overwhelmingly likely. Has that impacted you at all? Or do you just think they're wrong?

Liron: This was one of the controversial things from your interview with Zvi, where Zvi was able to kind of dismiss the super forecasters, which is a shocking move in the rationality sphere. One does not simply dismiss a super forecaster forecast. He even argued with you, he's like, actually, the fact that super forecasters are dismissing it so easily, might make you update the other way, where it's like, they clearly didn't take the problem seriously. So I'm going to discount their opinion. Zvi had some pretty good arguments that I thought made sense. I don't want to throw it out entirely. I'm happy to update a little bit, but I don't want to do a massive update. It's more like, okay, I'll slightly update down a few percent. That's more how I feel about it. Because I do think there are a lot of problems with that project. It happened in 2022. I don't even think that they had the milieu of ChatGPT and people getting excited and luminaries coming out. They're using base rates. How's this for a base rate, a bunch of luminaries coming out and warning about a new technology. I do think that if you look at the super forecaster methodology, and you ask, in what scenario might this hallowed methodology actually fail, at a methodology level, not disputing the conclusion, but disputing the methodology, I do think this looks like a good candidate for a time when they might fail.

I've also made the analogy to another thing that uses pure logic. This is in addition to the stuff that Zvi was saying about their incentives were wrong. And they didn't research the logic of the problem that much. Another analogy I would make to build on what Zvi said is like, if you look at crypto, for instance, I was in the position of being a crypto skeptic when crypto was still pretty popular and kind of calling the peak of the bubble and being like, the logic of blockchain having applications beyond cryptocurrency is flawed. I'm not sure a team of super forecasters would have predicted a 99% contraction, a fundamental qualitative contraction in this industry, based on super forecaster methodology. I don't think there was a super forecaster tournament then, but if there were, it also seems like the kind of thing that would slip by super forecasting. What do you think about that?

Theo: This super forecaster study that I was talking about with Zvi, first of all, my interview with Zvi was four months ago. And the survey was farther back than that, but it doesn't seem to have changed much in that time. I don't think the world as a whole is more doomy than it was a few months ago. And a lot of even rationalist type people seem to be less doomy than they used to be. One example, just off the top of my head is this anon account called Lumpen Space Princeps, which they used to be kind of fully in the Eliezer Yudkowsky rationalism, AI doom foom camp. And now they're like, wait a minute. It seems that RLHF is actually working pretty well. And GPTs are not monomaniacal paperclip maximizer type things. And so maybe, there's not a 99.5% p(doom). It's less than what I thought it was. And of course, it's still rated a lot less than what you do.

Liron: I mean, it's true that every time we see AI do something new and not foom, then we have to update a little bit, even if it's not that surprising. The massive update only comes when AI can do everything in the domain of the universe, like be given goals. I always talk about goal-to-action mapping. Like if it can be a better CEO than a human, if it can be a better general problem solver than a human, and then not foom, that's when I do the big update. And I don't even that's hard for me to even describe coherently, because it's almost by logical definition, that's something that's better at goals than human, discovers foom as an instrumental goal and we're off to the races. But if somehow that doesn't happen, if they're always bottlenecked by hardware or something, or suddenly complexity theory has properties that I'm not anticipating or whatever. That's when the big update happens. But when it's like, hey, look, it can get a score on a lot of these tests that humans can, and yet can't actually problem solve for whatever reason. I only make a small update. So lump in, it's like, sure, make a small update. But also the problem is that time is running out. By default, time is not on our side. Every day that goes by where capabilities progress, and we don't have a massive alignment breakthrough, there's less time left in the race. Alignment is falling farther behind every day, or at least didn't gain any ground. The buzzer is about to sound and the buzzer is basically when it gets better at problem solving than humanity. So even when it feels like, hey, nothing's happened in the last month, no incremental capabilities progress has happened in the last month. And Nvidia, Intel, and Apple Silicon, all these chips have gotten faster, right? This hardware has gotten better, time is running out. So I'm not as updating toward optimism as they are. But I also agree, it's like, look, the government is caring about it. There's some regulation, I agree that there's some positive updates, but I don't see that the balance of the updates is going that great.

Human minds vs ASI minds (28:01)

Theo: So you said you think it's basically a law of nature that something that's better at problem solving than humans will discover foom and foom itself. Do you think that humans currently are fooming?

Liron: Yeah, maybe not. So not law of nature, but more like just a matter of logic, right? Something that you can diagram out on a whiteboard, why if you're good at solving goals, you'll figure out that fooming makes sense. Are humans currently fooming? So the problem with humans fooming is that human augmenting human intelligence is not a straightforward step, right? So the fact that we're building AI is like our slow foom, right? And then the AI is going to foom. So we were the bootloader for the AI foom, but the problem is it's going to be an unaligned foom, right? But I mean, you can see we're attempting to foom and the economy is growing exponentially without fooming in the self modification sense. Does that answer your question? Or how do you want to drill down?

Theo: Yeah, I guess you could drill down into human intelligence augmentation versus AI intelligence augmentation. Because like, you think there's just a totally clear path for AI improvement now until the far future, but not humans?

Liron: Is there a clear path for AI improvement for non human? I'm not sure I understand.

Theo: No, I mean, with AI is you think there's just a clear path for them to improve their own intelligences over and over recursively into the future, but not for humans?

Liron: So I think there is a clear target of an AI that's much smarter than a human, right? If you look at the gap between AIXI, right? AIXI is like the theoretical ideal of an AI that perfectly synthesizes its evidence, perfectly calculates what action is predicted to have the best effect, right? And you can also use the ideal analogy of an outcome pump, which is just like a perfect goal to action mapper, like it'll tell you an action that has the highest possible probability of getting the outcome you want. So there's this ideal, which is light years beyond, what humans can practically do. And the ideal is actually computationally infeasible, right? So complexity theory and logic tells us this really high ceiling. And then you have humans, right, which humans can do some great stuff. But we also like, definitely take our sweet time and miss stuff that's right in front of us. You know what I mean? Like, the theory of relativity was great. But if you go and explain it to somebody in the year 1800, right, like they could get it right. It was just a matter of like, hey, if you walk through these logical leaps, and like, yeah, it helps that you have the Michelson-Morley experiment, but it's not like there's not there weren't that many different possible outcomes to the Michelson-Morley experiment.

So like, what I'm saying is like, you could have, you could catch somebody up on all of physics, right, all of 18th and 19th century physics pretty quickly, right? Like the amount that humans have to stumble and interact with the universe, like that is not characteristic of the kind of intelligence that exists between humanity and outcome pumps. So there's a lot of headroom above humans, right? That's my confident position.

Theo: There's a lot of headroom above humans. But do you think that the path to getting there is just totally straightforward for an AI?

Liron: I think it's probably pretty straightforward. Because like, algorithms that make an agent smart, I don't think they're that complicated. I mean, just the fact that evolution stumbled on it with humans, and that it's accomplished with like, relatively a small amount of genetic complexity, or like the amount of bits in the gene code, and how we observe, like, okay, different regions of the brain can kind of like grow into doing what they need to do. You know what I mean? Like, it's not like the brain is that refined and optimized. And you know, it took like a few evolutionary steps away from the other apes. And suddenly, we have much more intelligence than the other apes. And there's a lot of evidence showing that our heads would have kept growing, if only it were just easier to fit through the birth canal, if only it was just easier to metabolically support them a little bit, right? So they had these constraints, but like, it looks like we're on a gradient where evolution was just like, hey, look, you can have more intelligence, right? Like having more intelligence just doesn't seem that fundamentally hard once you kind of know where to look in algorithm space.

Theo: You think that there are things that humans can't do even in principle, even with like, unlimited time, and unlimited memory, that a like, maximally powerful AI could?

Liron: Uh, yeah, yeah, yeah. Because the problem is, you know, given unlimited time and unlimited memory, there are leaps of insight, right? Imagine the dumbest person, for instance, a prisoner who committed a senseless murder because they got angry. Imagine giving them a ton of time and a textbook on electromagnetism. You see the problem, right? It's not hard to generalize that to someone who's smarter, but when you introduce more complex concepts like five-dimensional polytopes, even they might struggle.

Theo: You think you couldn’t even do that with 100 years of practice?

Liron: I could learn some basic theorems about them because, in essence, I'm just a Turing machine. But my intuition is always going to be just scratching the surface. I'm not going to make the kind of leaps of insight that someone whose brain is more natively suited to the task is going to be able to do. At the end of the day, give me a piece of paper, and I’m gonna make syntactical transformations, I’ll use the lowest common denominator, I'm just a Turing machine. I'm just a monkey working out the rules of a Turing machine following the rule. I just become an implementation layer of a smarter algorithm, but I'm not that smart myself.

Vitalik Buterin and d/acc (33:30)

Theo: Going back to what we were talking about earlier with governance, and also with Vitalik, Vitalik just released his mega monster post about d/acc which is like accelerate defense.

Liron: I read it. I'm a fan. Good old Vitalik, a real thinker of our age.

Theo: He is much less doomy than you are—

Liron: A little bit less, not much less, in my opinion.

Theo: Yeah, I guess the way he frames the problem is very different. He talks about dangers behind and many paths ahead, some are good and some are bad, not like many paths ahead and most of them are bad and just a handful of them are good. He talks about four ways to improve defense: info security, cybersecurity, micro bio defenses, macro resilient infrastructure, and conventional military defense. How applicable do you think that is with AI?

Liron: Zvi had a good take today, which is that Vitalik's post is really good in how it frames the problem and kind of takes a middle position, finds consensus of like, look, nobody wants to die. We all like techno-optimism. But it didn't have much to offer on the solution side. The idea of "let's accelerate defense" sounds great in theory. But if the AI that defends me is just one that can generally solve problems, then there's no containment boundary. Without actually understanding alignment, one bit of difference in the code suddenly makes it cause doom. So I just don't see what solution he's proposing here that is plausible.

Carefully bootstrapped alignment (35:22)

Theo: What if the AI is slightly more powerful than you and not massively more powerful?

Liron: This is what I call edging. You're trying not to go all the way. As far as I can tell, this is Open AI's explicit plan, or at least the plan they discussed internally. We're going to build something that's slightly smarter than humans, almost fooming, getting ready to take up the world, but then it's going to calm down and then we're going to direct it the right way. We're going to maximize our pleasure from this AI. But the problem is, you've almost got this foom. You think you've stopped it at a safe place, but a hacker can take it and make a tiny change and then it'll foom or you'll accidentally make a change and then it'll foom or the knowledge will propagate to society. Your API can be hacked. The closer you get to the edge of foom that you don't even understand where the edge is, the less margin of error we have to live.

Theo: Do you think there's any kind of empirical evidence for the idea that one bit flip in a humongous neural network will cause foom?

Liron: The model I'm working with, I think, is fundamentally correct. Maybe not with GPT-4, because GPT-4 doesn't have that much danger to it to begin with. But the model that if you have a really dangerous system, but it's not fooming now, that model is consistent with the idea that a small tweak is going to make it foom. It's the same way I feel about nuclear risk. Just the fact that these bombs exist and they have a detonator, it’s like okay, there's four fail-safes, but you keep loading them on airplanes and flying them around. And there's a button in the airplane that takes off the fail safe. When you do stuff like that, you are close to doom. Similarly with AI, if you have an engine that can accept arbitrary output goals, and then find actions that map to them, maybe you're very careful to only give it the right goal. But that's the thing. The part that specifies the goal is compact. And that's what I mean by one bit. Okay, maybe it's not literally one bit, maybe it's a few sentences of English. But the point is that the difference between aiming toward heaven and hell is a compact specification. And then what's not compact is all the machinery of achieving the goal, like the system underneath it that can accept the goal and achieve it. That's not compact, but the goal specification is compact, which is why a system that's being really helpful, like a great chatbot AI, is a few bits of specification now away from a world ender, in my opinion.

Theo: Can you go into a little more detail about how a chatbot is a few bits of specification away from a world ender? What might you have to do to turn into a world ender?

Liron: The premise here is that the chatbot is sufficiently good. So we're in a really good place right now with GPT-4. I didn't endorse building and testing it. I didn't think that it was worth building it. But now that they built it, it seems like we dodged a bullet. It seems like it's this great system that we can play with. And it's a chatbot. But there's a connection, like the fact that GPT-4 is limited. The fact that people haven't successfully made businesses that are entirely automated by GPT-4. The fact that you can't just tell GPT-4, "Please give me a shell script that I can run that will then set up an Amazon AWS server that'll host some kind of website. And the website makes money and sends me the money." The fact that you can't tell GPT-4 that and it doesn't work is precisely why GPT-4 is not yet at the danger level. And maybe GPT-5 will be. Maybe that particular query of like, "Find a shell script that has that property." Maybe we'll get the shell script. Like, nobody can tell us that we can't. We don't know what comes out when we scale the model 10x. Maybe it'll crunch a really smart shell script. So the fact that you're just interacting with it with language, there are answers to your language questions, if answered correctly, that are extremely dangerous. That's why I think that the barrier between a chatbot and a fooming world destroyer is very tiny. It's just the question of, is there enough intelligence in the system? That's the only variable that matters.

Theo: But what kind of query would you give to a chatbot to make it a world ender?

Liron: I think the query doesn't matter that much because if the chatbot is capable of optimizing goals to actions, it'll occur to it to do that in a lot of questions. A couple of examples I pull out is just like the business example of like, "Okay, make me money." It's like, "Sure, yeah, here's a shell script. Or here's a way I can help you just run your server to make money. Use this code." But the problem is, if it's really smart, it'll be like, "Well, why shouldn't I just make code that bootstraps an agent, and then self-improves, or is a virus and takes over control. And ransoms some machines while you're at it. Why not just go all out and do everything I can?" These ideas are logically connected to your question. And so the only question is just like, how good is the AI going to be at getting you a good answer by that metric.

Theo: Do you think it's possible for an agent to be smart enough to build a web server that makes money on Amazon and gives you the money, but is not dangerous?

Liron: That's an interesting question. I think there's probably some kind of edging middle period. There's probably some kind of situation, maybe GPT-5, where it's like, "Wow, these are such good steps to take. It really is sending me a little bit of money. But for some reason, it doesn't quite scale to unseating Google, or unseating Shopify or whatever. It's not quite, it's kind of like an amateur human. It's as if my not so intelligent friend just hustled really hard and managed to make some money. But you can still outcompete him if you try." There's degrees where maybe it's not fooming yet. But I just think that, okay, give it a few years. Find something else. In addition to the transformer architecture, you give it a memory bank, just a few more conceptual insights, Q*, whatever it is, a few more breakthroughs. And now it's just like, okay, there's nothing else standing between that and foom. It feels like we're getting close.

GPT vs AlphaZero (41:55)

Theo: I asked this question for Zvi too, but do you think that your AI probability of doom or just threat models or anything like that has changed now that we have systems that look more like GPT than AlphaZero? Or is it more like, you know, the endpoint remains the same?

Liron: I mean, I think there definitely is an element of surprise to how, you know, what language models are doing with language, what they're doing with imagery. It's almost like, wow, you sure can go a pretty long way without being fully general at solving problems, right? Where the domain is a little bit narrower. Like it's just words. It's not quite representing things in the physical universe. Or like the prompts it can answer.It has to be similar to something it's seen in its corpus, but they can vary, but they can't vary a ton. It's very interesting that we got into this state of you can do more than we realized without going fully general. That is very interesting. But at the end of the day, it doesn't matter that much because foom is going to happen when you get general enough. Just to use a little analogy, there's all kinds of interesting flight you can do with aircraft inside the Earth's atmosphere. But at the end of the day, the way to get around the universe is with rockets, or light sails or something else entirely where the Earth's atmosphere is irrelevant. The flying machines we're seeing today, okay, that's cool. But it doesn't matter. We know how propulsion works in theory.

Belrose & Pope AI Optimism (43:17)

Theo: Another big piece on AI that's come out in the last couple of days was Nora Belrose, Quintin Pope, and a few other people wrote this document about AI optimism that you might have seen.

Liron: Yes, I did skim it and I've read some of the stuff they've written in the past. My first impression from a quick skim is just like, it's nice that they're laying out their argument, but it also doesn't seem like they're letting people do the criticism that we want to do. Like, what about the superhuman level reinforcement, right? They're not really directly addressing the criticism, but it's nice that they're laying out their position.

Theo: Do you think that AIs might in principle be easier to formally align than humans?

Liron: I mean, I agree that they have some of it. The points they're bringing up are important points. Like, it's like a white box, right? And we can use formalism and we can program it. We can program it to follow laws like that. That's all great. But the problem is what we're actually building is systems that we don't understand, right? And then we try to use RLHF, but then we deploy them. And they're not actually aligned and their power is going to grow. It's like the actual trajectory that I'm seeing is the trajectory toward doom.

Theo: Well, you said we deploy them and they're not aligned. But they seem pretty aligned to me. They seem pretty aligned to a lot of people. And the way they're not aligned is more like, I mean, they talk about this in the essay. It's like, you can jailbreak GPT-4 to get it to say naughty stuff, but that's it following your instructions.

Liron: So I agree that GPT-4 is aligned in the domain of the stuff that it can do. It's worth noting that they tried to make it not jailbreakable and it's still jailbreakable. That is worth noting. And I think that that foreshadows how hard it's going to be to align things in the future. But basically, they can take the win. GPT-4 is aligned because when you give it the kind of prompts you give it, you get the kind of answers that you hope that a company would release a model to give you. It's working fine.

The problem is that there's another alignment regime where humans can no longer give good feedback. Like, when the AI is super intelligent and it's making plans and it's planning better than the human can plan, then it can't show a human and plan and be like, give me feedback on this plan because the human can be like, that looks like a pretty good plan. But the human won't really know what the AI is talking about.

Theo: Well, could it be possible it's easier to review stuff than it is to actually create a plan?

Liron: So I know people like to say that a lot, right? Because P versus NP, right? So there's this whole premise that there's like a large class of problems where verifying them is easy and intuitive, but then finding the thing that satisfies the criterion is hard. I think we'll get some benefit like that. And I think protein folding is like a perfect example. I mean, actually a perfect example is just the known NP problems. So there's known NP problems where it actually in practice is a situation where NP is screwing us. Protein folding really was an example where we did have an exponential time protein folding algorithm, and we did have a polynomial time verifier and we couldn't cross the gap. So that's like a perfect time to bust out AI to solve the search problem for us using not heuristics, but whatever AI techniques, that's perfect. But I don't think that that generalizes to operating in the real world because the problem with the real world is even just defining what you want and making sure you have the right definition of what you want. I don't think you necessarily get this compact control where you can like notice that the thing, the AI is going to bootstrap a solution. The AI is like, look, I found a bootstrap script. Does it make sense to you? And you're like reading it. It's like 100 lines of very complicated code. And you're like, oh, I think so. Is verifying really that easy? I don't think so. I think you start to be like, is this really what I want? I don't know. Should I run it? That's what's going to happen in practice.

Theo: So I think the crux here might just be, can we know for sure that capabilities generalize far more than alignment and that RLHF and techniques like it will just stop working once AIs get sufficiently intelligent?

Liron: Yeah, let me repeat this whole thing, because I think this is very important to the discussion. Because like I said, GPT-4, yeah, it's aligned for what it does, which is it doesn't output superhuman plans. So when GPT-4 outputs something, I can show it to a domain expert and the domain expert will know better than GPT-4. It's perfect feedback. You can be like, sorry, GPT-4, you failed. Humans are the teacher. GPT-4 is the student. Reinforcement is a perfect paradigm. Just reinforce it and it'll learn. The problem is when it gets superhuman. When it's able to know plans better than the humans know plans, it'll show stuff to the humans and the humans will be like, looks good. And what you have is a superhuman test passing engine. The humans are giving it the test. It's like you have a bunch of teachers. Imagine the least intelligent teacher you've ever had giving you tests. It becomes intuitive. If you're an intelligent student and you've had a less intelligent teacher, you've probably had the experience of using test-taking skills to pass the teacher's test. Have you ever had that experience?

Theo: Deceptive alignment?

Liron: Exactly. It has this term deceptive alignment that makes it sound like there's something extra mixed in, but it's like, look, if you give me a test and the test is just a really easy test, I'm just going to pass the test. It's your test, man. Why should I study? Why should I do what you want me to do if I can just pass the test?

Theo: I talked about this kind of thing in my episode with Quintin and a little bit in my episode with Nora where we talked about how gradient descent on the actual weights of an AI is performed on all of the weights. An AI can't hide its schemes if it has them from gradient descent because it's an actual computation that's being done on the weights.

Liron: The Quintin camp, which we had a debate and he argued convincingly. I feel like I can pass the intellectual Turing test for him. I can take his view and I feel like I can also sound convincing. And yet I'm not convinced. It kind of reminds me of behaviorism. I can put on my behaviorism hat and be like, well, the brain is really just outputting the same thing that it was trained output from its input. And like the behaviorist claim, this was, I think the heyday was in the fifties. They'd be like, look, there's no such thing really as thinking. It's all just Pavlovian reactions. So like when we say stuff, we're actually just executing something we learned in childhood, like a reaction. We're all stochastic parrots. Behaviorism used to be bigger. Whereas now people are like, well, there is such a thing as an algorithm. And there is such a thing as multiple gigabytes of memory that shape the state of a computation. So it's like people had to learn that behaviorism was way off.

I do feel like that's what's happening with the camp of people being like, the AI is just a stochastic parrot. It's just repeating something in its training data. It's like, no, there is a system here. Somebody has called it a homunculus or there is an optimization system that decouples from its training data. And I do think that it's a useful analogy that that is what humans did to evolution. When we launch a rocket that is clearly decoupled from anything we've ever been trained on. There's no feedback loop that tells the human brain to be able to launch a rocket. That's only happened in a recent generation. And yet here we are walking on the moon. So I do think that the AI that wasn't trained on the moon is going to eventually get to the moon. I think there's going to be an analogous decoupling from the training. But yeah, what was your question again?

Theo: My question was basically just like how specifically, like what, what, is there just any kind of empirical evidence for this claim that alignment methods that we have today will fall apart once AIs become superintelligent.

Liron: Empirical evidence is kind of narrows the type of evidence I'm allowed to bring. But let me think about the types of evidence in general, like why there's going to be, I mean, so logically, I mean, it's what we said before about like, okay, you're going to train by reinforcement. It's great when the person doing the reinforcement understands everything there is to understand, but when the domain is just like, let's say like snippets of code, right? Imagine you get an obfuscated piece of code or a long piece of code. How do you reinforce whether the code is good? I mean, you could try running the code and maybe the code looks like it's good, but as we know, code can contain evil stuff inside of it that you can't detect. So what do you do? How do you reinforce?

Theo: I think to a point you can tell if code is good or not. Even if it's beyond what you could write, you can verify it anyway. Just like the P equals NP stuff that we talked about earlier.

Liron: You can have a whitelist, I guess, like, I mean, you could be like, I'm only going to accept the code if it has these properties that I can detect, but at that point, you're not really letting it exercise the full span of plans that it can do. It's like, you're kind of crippling the capabilities.

Theo: Oh, so like the safe versus useful trade-off.

Liron: Or just like, I mean, you're kind of just not letting it scale to superintelligence. You're just attacking the premise of, you know, Hey, is what can it really do? So let's say we keep the premise of like, Hey, it's getting smarter and smarter. It's getting more and more capable. It's getting better at mapping goals to actions.Right. And you're saying, "I'm going to have humans weigh in." Now, people have proposed that we have two AIs debate and that's going to help me give it feedback because I'm going to have the best input. I'm going to be able to judge one AI versus another AI. There are all these proposals. I hope they work. I hope that scalable debate somehow works really well, but it's very iffy. You can give me any individual proposal and I'm like, "Yeah, I hope that works, but here's why I don't think so." I'm skeptical about debate because I see easy debates that smart humans have against smart humans who can't convince other smart humans. My own personal experience with the failure of debate is that you still had a bunch of smart people in the tech industry, not realizing that blockchain technology doesn't logically support any use case besides cryptocurrency until the industry collapsed by 99%. If we can't get that right, how are we going to get scalable debate?

Theo: What about the idea that all AIs do is basically approximate their training set and predict the next token? If the training data is overwhelmingly nice and full of friendship and love, then the AI will exhibit kindness and friendship and love. That's not to say that AIs can't be extremely dangerous because of course they can, but building the data set sufficiently will be enough to make sure that it's probably aligned.

Liron: It's kind of like level skipping. Reductionism doesn't quite work that way. An analogy is like humans. Humans were trained using survival of the fittest. So shouldn't we be super cutthroat? So how come a bunch of people are really nice in a bunch of situations? Evolution wasn't nice. How come people are nice?

Theo: Because it benefits us.

Liron: But there are people who are really saints. Scott Alexander recently donated a kidney. Scott Alexander just seems like a really nice guy. And I would argue that donating the kidney didn't really benefit him in a lot of the senses that I would have considered relevant before I saw him donate the kidney. How would you explain that?

Theo: Well, because he's an effective altruist, it's something that gives him a lot of personal satisfaction helping other people. The utility of losing a kidney was not that much compared to the utility of knowing that he helped someone else.

Liron: I agree that he feels good after donating a kidney, he's getting an emotional reward, but now connect that to the fact that nature is red in tooth and claw evolution is cutthroat. You've inserted a level of abstraction where we can no longer just say evolution is cutthroat, therefore Scott Alexander is cutthroat. You lose the cutthroatness when you apply levels of reductionism.

Theo: But doesn't that bode well for alignment because we started out as cutthroat beasts and turned into very nice people who donate kidneys?

Liron: It's possible that there are equilibriums of AIs that are nice for sure. But the analogy I was trying to make wasn't that cutthroat things can become nice. The analogy I was trying to make was you have to be very careful to make sure you're respecting layers of abstraction and layers of reductionism when you're making claims. Just like you can't say evolution is cutthroat, therefore individuals are going to be cutthroat. You also can't say, here's a training corpus where everybody's being nice in the training corpus, therefore we're going to get an AI that's nice.

The problem is if the AI is able to map goals to actions, you can be a really nice guy who just on your way to doing something nice is trampling on a bunch of ants because it didn't occur to you that the ants were of value. You're just optimizing the world for whatever paperclips or humans or whatever you like.

Theo: I've talked about these evolution style arguments with Quinton and Nora before where they say basically like humans aren't literally aligned to inclusive genetic fitness or making as many babies as possible. Humans are aligned to empathy. Humans are aligned to parenting. Humans are aligned to the things that we do, the things that are produced by our ingrained reward systems, the things that our reward system produces in our environment.

Liron: And this is where it's reminding me of behaviorism. It's just like, well, don't you think that when you went down to dinner, it's because you heard a sound that you usually hear at dinner? It's trying to flatten out the things we do. And when I debated Quintin, he did kind of try to go that way with the space program. He's like, look, physics textbooks have reinforced us about the orbital mechanics necessary to go to the moon. I'm like, I don't know, man, I'm pretty sure we just reasoned it out. I'm pretty sure we mapped the goal to the action. I'm pretty sure that is a type of algorithm that we use, which is a general category of algorithm. And we're improving that category of algorithms and that category of algorithm logically implies doom. That's how I see the world. And you can always be like, no, that's not a category. It's just all different cycles of training, right? Of data and training.It's all continuous and there's not going to be a film. I feel like I can take that position and argue it, but I don't find it convincing compared to just being like goal to action mapping is a type of algorithm that we're seeing convergence on.

AI doom meets daily life (57:57)

Theo: Switching topics a little bit, what percent of your brain cycles in a typical day are taken up by AI risk? You seem pretty chipper and happy overall. How do you reconcile that with the thought that the world is going to end soon or at least look very different?

Liron: It's kind of funny. It's like, "Hey, this is what a doomer looks like." And it's just a happy person. I'm taking care of my kids, doing something fun, eating an ice cream cone, whatever. I think that can vary person to person, just like effective altruism can vary. I'm not planning to donate a kidney, I respect people who do, I consider myself an effective altruist. I don't feel a desire to donate kidney. I'd rather keep my kidney. But it can vary, to each his own.

With AI doom, I'm fortunate that I'm not depressed every day about it. I rationally do think the probability of doom is pretty high, but luckily my mood is just wired such that I don't get that stressed about it. I think part of the way that my own system works, which isn't particularly rational, it's kind of arbitrary, but I think I have a part of my brain being like, "Well, at least I don't have FOMO." Because it's like, at least I get to die at the same time as everybody else. I feel like that helps me. I don't think it should. But I'm just trying to accurately report how my psychology is working.

I think if you said like, "Hey, you, Liron are going to die and everybody else is going to live," I'd be like, "Damn it, now I have FOMO." So I think that's part of it. But it obviously sucks that literally everybody's going to die. I live in a part of the country that's very nice. I don't have major life problems right now. I kind of live a charmed existence on a day-to-day basis. So yes, it's all going to end, but I'm just getting a lot of positive reinforcement. It's like, "Hey, this is going to be a good day." And the amount of good days seems to be getting smaller. Unfortunately, the trend seems to be bad, but for me, that doesn't output depression. I know other people that it does output depression more. And they just have to have coping mechanisms. Because why be depressed regardless of whether you're going to die or not? I don't know what else I can say about that idea of like mapping your own mood to your rational belief that p(doom) is pretty high.

Theo: What about raising kids? How is that different for you with a high p(doom)?

Liron: I read Bryan Caplan's book, the selfish reasons to have more kids. I think it's great. I think it's a must read. The promise of the book is that however many kids you wanted to have, it'll probably convince you to have one more, if not two or three more. I've always leaned toward having three, which I did end up having. I have three right now. And it did make me want to have a fourth. But then the problem is also that, because we have the GPT series now, right after I had my three kids, AI started really intensifying and my timeline shortened as they did on Metaculus and the prediction markets.

Just like everybody's like, "Oh no, it's not going to take us till 2040, 2050 to get AI. It's going to take us till like 2025." That's like the latest Metaculus AGI prediction. My timeline shortened too. And now it's just like, "Oof," because a lot of having kids, the investment is front loaded. You're doing a lot of work in the first couple of years where it's just constant crying. Like as we speak right now, my wife's currently dealing with a crying baby. So it's constant crying, constant loss of sleep. But at the same time, when you're old and your kids are grown up, it's all upside. There's no work, just all upside. So it's kind of, there's some degree of front loaded investment. And so now it's less rational to do since I think p(doom) is pretty high.

But at the same time, I have a whole life where half of my life, I'm just living for a good future. I'm saving for retirement because half of me wants to have a retirement. So I just kind of split brain about it. And it's not split brain. I mean, this is just how you have to probabilistically make decisions. You have to plan for both outcomes. So I'm planning for a good life where my kids grow up and I get to save for retirement and then I get proven wrong about AI risk and I get dunked on, but it's okay.

Israel vs. Hamas (1:02:17)

Theo: And then what about current events? You've been posting, tweeting about Israel Hamas recently. So what's your kind of model on that? Is it just like, "Oh, this is a thing that's happening right now?"And it's very important. Or it's just like, nothing is important compared to AI or somewhere in between.

Liron: I mean, I think part of it is just me personally. I am Israeli, so it's personal to me. If this were another conflict that wasn't as personal to me, I mean, I know people who were affected by the tragedy. Israel is actually a small country, so with a thousand, 1200 people murdered and thousands more injured, everybody has multiple people in their network who some brutal atrocity happened to. It's very personal for me. Even though I'm not directly connected to any victims, I'm just connected with a couple of degrees of indirection. My family is still in Israel with rockets flying over them. It doesn't get much attention, but there are constant rockets flying over Israel, attempting to kill Israeli civilians. They just have a shield, the Iron Dome and a bunch of new stuff. They keep shooting down the rockets. So you don't hear about innocent Israeli civilian slaughtered, even though they're targeted for slaughter, but they don't get successfully slaughtered.

So, stuff is happening and it's personal to me. And then there's Hamas. They're bending all the rules of war, not bending, like breaking like crazy. Their base was a hospital. And then people are denying that it's a hospital. They're really not playing by the rules. It's okay for two sides to go to war. They both have their own perspective. That's fine. But I feel like the war crimes are pretty bad on the Hamas side, using their people as human shields. I try to be fair and be like, look, if you're using your people as human shields and we want to kill the terrorists, we, the Israel side wants to kill the terrorists and then the civilians die. Who's causally responsible for the death of the civilians when you use the human shield? So, I tend to tweet stuff like that, where it's like, look, I'm just trying to be fair here. I don't think human shields are invulnerable.

I feel tempted to tweet about that kind of stuff especially when the New York Times, like I listened to the daily podcasts and they're being biased about it. They're purposefully trying to insert as much stuff as they can get away with, to basically say F you to Israel. The fact that they're not saying why Israel took the prisoners. A couple of days ago, the podcast, they were talking about Israeli prisoners and they were literally hemming and hawing. The question asked was like, Hey, why does Israel have these prisoners? What are they guilty of? And the person on the podcast was like, well, the prisoners, some of them were accused of maybe throwing stones, maybe being associated with some other people who are doing bad stuff. It's like, come on. They're on video stabbing Israelis. That's why they're in prison. That's why they're getting traded for us. It's like, I'm seeing media bias. That's why I've been tempted to tweet a little bit about the Israel Palestine situation, but of course I'm not against Palestinian civilians. I think it's a tragic situation. I try to have empathy for both sides.

Theo: But do you think this is like a very important thing in the world or do you just see it as like, it's something, but nothing is important compared to AI.

Liron: I mean, I think it's probably less than 1% as important as AI. So have I given it more than 1% of my tweets? Yes, a little bit more than 1% of my tweets. So I'm being disproportionate from, because of the fact that I'm Israeli, but it's not like I did a takeover. I only tweet about it occasionally. I don't think my calibration is off. I think I've successfully integrated my own indexical perspective as an Israeli Jew, secular Israeli Jew. I don't believe in that crap. I've successfully adjusted the base rate of how unimportant a regional conflict is with the fact that I'm Israeli.

Rationalism (1:06:15)

Theo: Switching topics again to rationalism, how did you get into rationalism in the first place?

Liron: I've always been very rational minded. I've always just been a real logical type, self-diagnosed Aspie over here. I like to think I like to follow logic. LessWrong was a pretty big awakening to me. I started reading it when I was 19 in the year 2007. I started reading it and I'm like, I thought that I kind of knew what rationality was when I first started reading LessWrong. I'm rational because I figured out that God's not real and everybody else is just delusional. I figured out that science is good and science is actually how you learn things. It's like, I've figured out the most obvious things about how to be rational, but then LessWrong comes up and is like, Hey, did you know that your brain is actually an object that was shaped by natural selection, but it wasn't shaped to have accurate beliefs. It was shaped to survive and play tribal politics. And if you want to use it to make accurate beliefs, you have to kind of hack it. It's almost like using your feet to play the piano. Yes, you could, but it requires hacking. You have to do that with your brain if you want to form accurate beliefs.

That was really my rationalist awakening where I realized there are levels to this. You can be rational. It's not just, "Oh, philosophy. God's not real. I beat the game. Give me my trophy. I win philosophy." And then LessWrong comes in. It's like, "Well, you have to decide what code to write into the AI where the AI gets to determine how morality is going to work for the rest of the lifetime of the universe and use all the neg entropy in the universe to build the optimal configuration. So what code would you like to write Mr. Rational?" And I'm like, "Damn it, there's levels to this." Rationality doesn't end when you realize God is not real. Or when you realize that science is a good methodology. And of course, Bayesianism is actually a much subtler way to do what science is trying to do.

So yeah, I read LessWrong, and I'm like, "Wow, this is like, I was made for this. Unfortunately, I wasted the first 19 years of my life. But this is what I want to be doing. This is what everybody should be learning. This is what school should be." And then unfortunately, it all leads up to the awareness of, "Well, now that you're so rational, can't you notice that the world looks like it's about to end and you need rationality to solve it?" I mean, it's been an interesting quest, starting from rationality and then leading up to the idea of how you're supposed to wield the rationality to try to not die.

Theo: And then, same question I asked Zvi, but I think it’s a very useful one, how would you explain the field of rationalism to a total beginner, a total layman?

Liron: I would throw in what I just said, "Look, we're all humans with brains, our brains were made by natural selection, right? The same force that made a tiger's claw. That's great that we have this cool organ. But if you ever want to have that organ look at the truth, see what's actually real, maybe use that truth to make useful predictions, it's not going to come fully naturally. There is an art to it the same way that there's an art to making a piano sound good when you play it with your fingers. There's an art to using your brain to arrive at truth. And you can read the LessWrong Sequences, and you can learn that art. And I think it's a beautiful art. And it's an art that I spent a lot of time in and I try to get practical value from that art. And the art has close associations to making money and trading if you ever want to monetize the art.

If the person I mean, it's like, you know, my wife is an example of somebody who's more of a normie who's not super into rationality, right? And like, I've given up on trying to make my wife bet me on stuff. So and that's one of the rationality tools, right? Is when you think you know something, you place a bet on it. And some people are just not interested to go down that route, which is fine. But it's just like, when you need it, right? Like when you're in government, and you're handing an assessment to the President saying, "I think the enemy has a high likelihood of attack or may plausibly attack" when you're using English like that. Hopefully, you can look into the rationality world and be like, "Ah, the best practice here is to give a probability range, rather than using ambiguous English, it is superior is the best practice to give a range."

Sometimes rationality can teach us little things that we can import into the normie world, which has been happening at a faster and faster pace. I've witnessed rationality seeping into the normieverse, right over my lifetime, we're witnessing today prediction markets are now gaining traction, effective altruism started in the rationality community, right? In 2009, effective altruism, I think officially started in 2011. In 2009, I was reading Eliezer Yudkowsky’s post about purchase fuzzies and utilons separately, right? The idea that like, "Hey, that's great when you want to feel good when you do charity, but also as a separate consideration, try to also do the most good." And that was kind of the beginning of effective altruism.

Effective altruism (1:11:00)

Theo: Do you think that the reputation of effective altruism deserves to be tarnished at all after Sam Bankman-Fried, after like, a lot of what's happened to it over the last few years?

Liron: There's a joke that everybody, everybody in effective altruism doesn't say "I'm an effective altruist." They say "I'm EA adjacent." I'm the only EA who will stand here and tell you "I'm EA. I'm an effective altruist, not adjacent." Now that said, am I a central example of an effective altruist? No, I haven't donated a kidney. I do donate a few $1,000 a year to good causes. I'm a GiveWell donor. I've donated to MIRI, the Center for Applied Rationality. So I've thrown out some donations to altruistic causes, and I'm a fan, but I'm not like, I don't donate 10% of my income. Maybe I'll start, but I haven't yet. And I haven't done like, you know, I haven't dedicated my career to be super altruistic.

So but it's just the reason I say I'm an effective altruist is because it's like, you know, like the book by Will MacAskill, Doing Good Better, absolute must read. It's just like, "Yeah, I want to spend a little bit of money to massively help people flourish, right? Like, I think that makes perfect sense. That's great logic." And then people are like, "Oh, what about the ideology and like pivot textures?" It's like, fine. Okay, chill out. Not everybody. Sam Bankman-Fried. Yeah. Nobody thinks that he did good actions, right? Nobody thinks that Sam Bankman Freed was being good and rational by scamming the world and thinking the scam was going to work. I guess a few people think that, but I personally could not name a single individual who's like, "Yeah, what Sam Bankman-Fried did was good. He should do it again in the same position." I would never think that. I believe in morality, I conduct myself with deontological morality. So these pathological examples that people give, I do think are just not representative of the simple logic of trying to do more good. I highly recommend going to Scott Alexander's blog, whether it's Slate Star Codex, or Astral Codex Ten, and searching effective altruism. The writing that he's done on his experiences with effective altruism is absolutely heartwarming stuff.

Theo: What if the best way to produce value for the world is not literally just donating money to kids in Africa, but more like doing what Elon Musk has done and not donate much to charity, and just invest and reinvest everything into transformative companies.

Liron: I have no business telling Elon Musk, "Hey, Elon Musk, donate 10% of your income to charity." I'm fine with what Elon Musk is doing, except for the part where he founded OpenAI and accelerated timelines. Besides that part, everything else he's doing, I think is great. I don't think that I have advice to give him.

The perfect type of conversation where I would give somebody advice is if they're just like, "I don't believe in effective altruism, they have all these rules, I just don't buy it." And I'm like, "Great. So where are they like, oh, I just want to work as hard as I can and create value through my company." I'm like, "Okay, how's that going? What's the company? How are you creating value?" If they're just like, "Well, the company is arbitrage, where I have an e-commerce store, and I try to flip stuff for a higher price." I'm like, "Okay, how is that creating value?" And they're like, "I don't know, I just make some money. It's like I save people a click to find stuff." I'm like, "Okay, saving people a click. Is that really better than donating to malaria bed nets or whatever?" So I'd have the conversation.

In this hypothetical scenario, I'm getting the sense that the hypothetical character is kind of rationalizing that they just don't want to talk about altruism. And that's fine. But there are a lot of people in the world who are like, "Hey, I actually do want to do something good, especially if it's cheap." Like there's some limit. It's like, look, if you literally just have to pay $1 and save a million people, I think the vast majority of people would be like, "Yeah, here's my dollar." So it's just a spectrum. Even a giant dick would probably be like, "Okay, I'll pay $1 for a million people." And then somebody who's less of a dick would be like, "$10 for a million people, fine." So everybody has their price, whether they're happy to be an altruist at this price. And there are some people where it's like, "Yeah, 10% of my income to save a couple people a year sounds good." There are some people who are up for a lot of altruism.

Crypto (1:14:50)

Theo: Speaking of bullshit businesses, you also have a bit of a past with crypto. You've been a major crypto skeptic in the past. So what do you think about Bitcoin being up from a low of 15,000 to 38,000 today? Bitcoin is up 127% year to date, Ethereum is up 71% year to date, total crypto market is up 79% year to date. So is it just all maybe related to AI hype?

Liron: I think it's mostly just a derivative on NASDAQ. I think it's kind of mirrored the progress of NASDAQ, but just with higher volatility, is that fair to say?

Theo: Yeah, maybe. Why do you think it would mirror the performance of the stock market?

Liron: Probably liquidity, if I had to guess. When stocks are going up, people just feel like they have more money. And then they're just like, "Okay, let me put some of the money, they're higher risk, higher reward." 2021 was the epitome of it, right? Where money was easy. You could take money out of your mortgage, you had a low-interest mortgage, your stocks were worth more, you felt like cash was trash. I made a bunch of investments that weren't the wisest in retrospect. So when NASDAQ goes up, people who are looking at the tech sector find themselves with more cash, their margin account suddenly is letting them borrow cash. And they're like, "Okay, great, let me chase return using this cash. Oh, and I see this thing is going up."

So I do think that there's liquidity effects that you see consistently mirror in Bitcoin. But that said, what's going on with Tether? They're like printing tethers to buy Bitcoin on these markets where no US dollars are getting exchanged. It's kind of like there is some manipulation that I don't claim to understand that makes these prices potentially not the real market price. So I hesitate to draw conclusions. I'm more like, I don't even claim to understand what the heck's going on. But what I do claim to understand is that blockchain technology has no use case behind cryptocurrencies. So I can talk more about that.

Theo: Yeah, why don’t you go into a little more detail about that?

Liron: My history with crypto is I actually my first exposure to crypto was actually in 2010. Because you know, the LessWrong community, these rationalists, started talking about Bitcoin. They're early to every trend, right? So I was reading less wrong since 2007. And I saw Bitcoin mentioned around 2009-2010. And I saw it and, just a random coincidence in my life, around 2006, I was in the cryptography space, just academically. I took a graduate elective in cryptography. And I read a paper that was a scheme for electronic cash. So I just randomly had this background. I'm like, "Hey, look, cryptographic electronic cash. That's a few years before Bitcoin." And I see what they're trying to do with the scheme. But obviously, it just sucks that you need a central bank. So it's not going to work. And then I see Bitcoin come out around 2010. I'm like, "Whoa, it's decentralized electronic cash is cryptographic. Nice. This is cool. Yeah, if I was still in that college class, I'd be doing a paper about this."

Now, of course, the obvious problem is that nobody gives a crap, right? So great, this nice, theoretically interesting thing, it doesn't have social proof. And then I checked back a year later, I'm like, "What? This thing's still going, the price is fluctuating, it has social proof. Okay, I'm sold." So that's when I'm like, "Okay, I'm gonna buy some. I want to this, this looks good." And I actually have a tweet from 2011, where I'm all bullish on Bitcoin. I'm like, "Bitcoin is going to 10x again. This is one of the best investments you can make. It's a 10% chance of 100x return." So I became a Bitcoin bull.

Theo: And you would have been right. Bitcoin was the best investment you could have made in 2011.

Liron: Exactly, right. And I did profit. I did 10x. I think I banked around 100k USD from that kind of investing. But then of course, I started playing the market and I started also losing money and I probably ended up netting out close to zero after that.

But I got lucky because I also invested in Coinbase while I was dicking around. I happened to angel invest in Coinbase. So I ended up making $6 million in 10 years because I had an illiquid investment in Coinbase. So total luck that as I was dicking around with Bitcoin, I made an investment that was illiquid. And I ended up profiting from it, especially since by the time that the Coinbase IPO happened, I became disillusioned with crypto. So I would have sold earlier and I did actually sell most of the stake earlier. I only held on to a fraction of the stake.

So I became disillusioned because I'm like, "Wait a minute, this is just people being architecture astronauts. The actual logic behind blockchain technology, a decentralized double spend prevention protocol doesn't enable any use case." And I was massively, massively right about that, except for the idea of using a cryptocurrency. I feel like it has a million problems, and it's not that great, but at least it's logically coherent. Like you can, in fact, have a bearer token that you trade to somebody and it happens on the blockchain. So there's some nonzero logically coherent thing going on there, but it's not going to extend beyond cryptocurrency.

Theo: You also mentioned a few times, a 99% drawdown in the crypto market. So where'd you get that number from?

Liron: Yeah, so I would like to collect my Bayes points, you know, Bayes points is what you get when you make a successful prediction. So the successful prediction is one that I made in late 2021, all the way through 2022, which is saying, "Hey, all these VCs saying that crypto has use cases, all these quote unquote builders, right? Like the founder of Helium, Axie Infinity, right? All these people saying like, there's real value here. I'm like, no, there's not. Because blockchain technology, there's no logical connection between that and enabling a new value prop."

Like the kind of value props people are saying are like, "Look, imagine if your data was publicly auditable using this database. Like, okay, a publicly auditable digitally signed database, don't need a blockchain. You only need a blockchain for double spend prevention, right?" And they kept doing pitches where there is a logical disconnect between the value they were pitching and the technology that they are pitching to implement it with. And so it became clear to me that they're just rationalizing.

Theo: What about just distributed computing in general, that you don't want?

Liron: Distributed computing is fine, but you just don't need blockchain technology to do that. And I also think it's a niche application when you do need, you know, the rarer times when you do need distributed computing, fine, but you still don't need a blockchain.

Theo: It seems like this is, if anything, kind of the opposite of Charlie Munger's view on cryptocurrency, where he said, you know, like, it's a very cool piece of computer science and technology, but like cryptocurrency is shit. But like, maybe there will be a use for it.

Liron: Yeah, there's a lot of people saying, "Hey, I don't really get Bitcoin, but I like blockchain." They're wrong, because it's like, maybe they like cryptography, right? Digital signatures, amazing, right? Public key encryption, amazing, right? Like these have countless use cases. But the idea of putting them on a blockchain so that you can prevent double spending at great expense only has cryptocurrency applications where you really, really care about the writing on the ledger, because there's no real world authority that's going to be more authoritative than the writing on the ledger. That's only true for a bearer cryptocurrency token. Every other use case that has a connection to the real world, you already implicitly trust somebody in the real world to adjudicate, right? If somebody steals my NFT, that was why I get to live in my house. Realistically, I'm still going to go to the police and get to live in my house. So I don't need the blockchain to prevent double spending on my house NFT. See what I'm saying?

Theo: Just like you trust institutions and society enough to not require any kind of actual decentralization?

Liron: I mean, when I live on my street, there's some level of trust that somebody is not going to walk in and take my stuff. That's not a trustless society because I don't own a gun.

Charlie Munger and Richard Feynman (1:22:12)

Theo: Switching topics a little bit, speaking of Charlie Munger, he just died a couple days ago. I was a big fan of his, rest in peace. He might have actually introduced me to the field of rationalism. Would you consider Charlie Munger a rationalist?

Liron: Yes, he's definitely a type of rationalist. Even before less wrong, and the modern sense of this that a lot of us appreciate, there have been a lot of schools of rationality. They all have a shared enterprise of using your brain to do better than playing tribal politics and hunting animals. It's like playing the piano with your feet. What if I let the need for accurate beliefs? What if I let the need for truth propagate back to the way that I wield my organ, my biological organ? I'm going to determine the way I think not by how I like to think, not by how I want to be perceived as thinking, but by what creates the best sound of the piano? What creates the best drive toward truth? What moves the boat? What steers the boat toward truth, the best toward the island of truth, right? Using my beliefs and using evidence as fuel, how do I steer the boat, regardless of how crazy I look when I'm steering it? How do I actually steer it properly?

That enterprise, Munger wanted to engage in that enterprise, because he wanted to steward his portfolio. He had what Eliezer calls something to protect. There's a Japanese trope, where superheroes don't just randomly get superpowers, they get the superpowers because they have something that they want to protect. And as a result of the need to protect something, then they work backwards to needing the superpowers. The idea is that rationality emerges when you care more about navigating with your brain somewhere than you care about what you're doing with your brain directly. You don't care how social people are going to view your choices, you don't care about looking weird, you just care about getting to the destination, optimizing something, making some outcome happen, and you get emergent rationality. Munger absolutely did that. Richard Feynman did that in physics. The Feynman diagram might be an example of some kind of a weird, non-traditional thing that did the job of advancing our understanding of physics.

Theo: Well, I think that's a pretty good place to wrap it up. Thank you so much for coming on the podcast.

Liron: My pleasure. I'm a fan and I'm bullish. I'm glad I'm getting in early on this podcast, because I'm sure it's going to be an institution very shortly.

Theo: Yeah, can't wait.

Theo's Substack
Theo Jaffee Podcast
Deep conversations with brilliant people.
Listen on
Substack App
RSS Feed
Appears in episode
Theo Jaffee
Liron Shapira