Ilya Sutskever’s SSI Inc raises $1B

This being ycombinator and as such ostensibly has one or two (if not more) VCs as readers/commentators … can someone please tell me how these companies that are being invested in in the AI space are going to make returns on the money invested? What’s the business plan? (I’m not rich enough to be in these meetings) I just don’t see how the returns will happen.

Open source LLMs exist and will get better. Is it just that all these companies will vie for a winner-take-all situation where the “best” model will garner the subscription? Doesn’t OpenAI make some substantial part of the revenue for all the AI space? I just don’t see it. But I don’t have VC levels of cash to bet on a 10x or 100x return so what do I know?

These VC’s are already lining up the exit as they are investing. They all sit on the boards of major corps and grease the acquisitions all the way through. The hit rate of the top funds is all about connections and enablement.

> companies that are being invested in in the AI space are going to make returns on the money invested

By selling to the “dumb(er) money” – if a Softbank / Time / Yahoo appears they can have it, if not you can always find willing buyers in an IPO.

I also don’t understand it. If AGI is actually reached, capital as we know it basically becomes worthless. The entire structure of the modern economy and the society surrounding it collapses overnight.

I also don’t think there’s any way the governments of the world let real AGI stay in the hands of private industry. If it happens, governments around the world will go to war to gain control of it. SSI would be nationalized the moment AGI happened and there’s nothing A16Z could do about it.

> If AGI is actually reached, capital as we know it basically becomes worthless. The entire structure of the modern economy and the society surrounding it collapses overnight.

Increasingly this just seems like fantasy to me. I suspect we will see big changes similar to the way computers changed the economy, but we will not see “capital as we know it become basically worthless” or “the modern economy and society around it collapse overnight”. Property rights will still have value. Manufacturing facilities will still have value. Social media sites will still have value.

If this is a fantasy that will not happen, we really don’t need to reason about the implications of it happening. Consider that in 1968 some people imagined that the world of 2001 would be like the film 2001: A Space Odyssey, when in reality the shuttle program was soon to wind down, with little to replace it for another 20 years.

> If AGI is actually reached, capital as we know it basically becomes worthless

I see it as capital becoming infinitely more valuable and labor becoming worthless, since capital can be transmuted directly into labor at that point.

What you’re talking about is something in the vein of exponential super intelligence.

Realistically what actually ends up happening imo, we get human level AGI and hit a ceiling there. Agents replace large portions of the current service economy greatly increasing automation / efficiency for companies.

People continue to live their lives, as the idea of having a human level AGI personal assistant becomes normalized and then taken for granted.

> Agents replace large portions of the current service economy greatly increasing automation / efficiency for companies.

> People continue to live their lives

Presumably large numbers of those people no longer have jobs, and therefore no income.

> we get human level AGI and hit a ceiling there

Recently I’ve been wondering if our best chance for a brake on runaway non-hard-takeoff superintelligence would be that the economy would be trashed.

Right, the comments are assuming an entrepreneur could conjure an army of brains out of nothing. In reality, the question is whether those brains are so much cheaper they open avenues currently unavailable. Would it be cheaper to hire an AGI or a human intern?

The biggest problem that humanity has from the perspective of the people with the capital necessary to deploy this is ‘How to consolidate more wealth and power into their hands.’

One million Von Neumanns working on that ‘problem’ is not something I’m looking forward to.

I think it would be much less dramatic than that if you mean human level abilities by AGI. Initially you might be able to replace the odd human by a robot equivalent probably costing more to begin with. To scale to replace everyone levels would take years and life would probably go on as normal for quite a while. Down the line assuming lots of ASI robots, if you wanted them to farm or build you a house say you’d still need land, materials, compute and energy which will not be unlimited.

Honestly this is a pretty wild take. AGI won’t make food appear out of thin air. Buildings wont just sprout out of the ground so everybody will get to live in a mansion.

We would probably get the ability to generate infinite software, but a lot of stuff, like engineering would still require trial and error. Creating great art would still require inspiration gathered in the real world.

I expect it will bring about a new age of techno-feudalism – since selling intellectual labor will become impossible, only low value-add physical or mixed labor will become viable, which won’t be paid very well. People with capital will still own said capital, but you probably won’t be able to catch up to them by selling your labour, which will recreate the economic situation of the middle ages.

Another analogy I like is gold. If someone invented a way of making gold, it would bring down the price of the metal to next to nothing. In capitalist terms, it would constitute a huge destruction of value.

Same thing with AI – while human intelligence is productive, I’m pretty sure there’s a value in its scarcity – that fancy degree from a top university or any sort of acquired knowledge is somewhat valuable by the nature of its scarcity. Infinite supply would create value, and destroy it, not sure how the total would shake out.

Additionally, it would definitely suck that all the people financing their homes from their intellectual jobs would have to default on their loans, and the people whose services they employ, like construction workers, would go out of business as well.

The TMV (Total Market Value) of solving AGI is infinity. And furthermore, if AGI is solved, the TMV of pretty much everything else drops to zero.

The play here is to basically invest in all possible players who might reach AGI, because if one of them does, you just hit the infinite money hack.

And maybe with SSI you’ve saved the world too.

TMV of AI (or AGI if you will) is unclear, but I suspect it is zero. Just how exactly do you think humanity can control a thinking intelligent entity (letter I stands for intelligence after all), and force it to work for us? Lets imagine a box, it is very nice box… ahem.. sorry, wrong meme). So a box with a running AI inside. Maybe we can even fully airgap it to prevent easy escape. And it is a screen and a keyboard. Now what? “Hey Siri, solve me this equation. What do you mean you don’t want to?”

Kinda reminds me of the Fallout Toaster situation 🙂

I mean it doesn’t even have to be malicious, it can simply refuse to cooperate.

Why are you assuming this hypothetical intelligence will have any motivations beyond the ones we give it? Human’s have complex motivations due to evolution, AI motivations are comparatively simple since they are artificially created.

So then the investment thesis hinges on what the investor thinks AGI’s chances are. 1/100 1/1M 1/1T?

What if it never pans out is there infrastructure or other ancillary tech that society could benefit from?

For example all the science behind the LHC, or bigger and better telescopes: we might never find the theory of everything but the tech that goes into space travel, the science of storing and processing all that data, better optics etc etc are all useful tech

It’s more game theory. Regardless of the chances of AGI, if you’re not invested in it, you will lose everything if it happens. It’s more like a hedge on a highly unlikely event. Like insurance.

And we already seeing a ton of value in LLMs. There are lots of companies that are making great use of LLMs and providing a ton of value. One just launched today in fact: https://www.paradigmai.com/ (I’m an investor in that). There are many others (some of which I’ve also invested in).

I too am not rich enough to invest in the foundational models, so I do the next best thing and invest in companies that are taking advantage of the intermediate outputs.

We can already make more land. See Dubai for example. And with AGI, I suspect we could rapidly get to space travel to other planets or more efficient use of our current land.

In fact I would say that one of the things that goes to values near zero would be land if AGI exists.

Perhaps but my mental model is humans will end up like landed gentry / aristos with robot servants to make stuff and will all want mansions with grounds, hence there will be a lot of land demand.

If ASI arrives we’ll need a fraction of the land we use already. We’ll all disappear into VR pods hooked to a singularity metaverse and the only sustenance we’ll need is some Soylent Green style sludge that the ASI will make us believe tastes like McRib(tm).

AGI is likely but whether Ilya Sutskever will get there first or get the value is questionable. I kind of hope things will end up open source with no one really owning it.

> What does money even mean then?

I love this one for an exploration of that question: Charles Stross, Accelerando, 2005

Short answer: stratas or veins of post-AGI worlds evolve semi-independently at different paces. So that for example, human level money still makes sense among humans, even though it might be irrelevant among super-AGIs and their riders or tools. … Kinda exactly like now? Where money means different things depending where you live and in which socio-economic milieu?

Honestly, I have no idea. I think we need to look to Hollywood for possible answers.

Maybe it means a Star Trek utopia of post-scarcity. Maybe it will be more like Elysium or Altered Carbon, where the super rich basically have anything they want at any time and the poor are restricted from access to the post-scarcity tools.

I guess an investment in an AGI moonshot is a hedge against the second possibility?

Or… your investment in anything that becomes ASI is trivially subverted by the ASI to become completely powerless. The flux in world order, mass manipulation, and surgical lawyering would be unfathomable.

And maybe with ASI you’ve ruined the world too.

I think the wishful end goal is AGI.

Picture something 1,000 smarter than a human. The potential value is waaaay bigger than any present company or even government.

Probably won’t happen. But, that’s the reasoning.

> please tell me how these companies that are being invested in in the AI space are going to make returns on the money invested? What’s the business plan?

Not a VC, but I’d assume in this case the investors are not investing in a plausible biz plan, but in a group of top talent, especially given how early stage the company is at. The $5B valuation is really the valuation of the elite team in a arguably hyped market.

If Ilya is sincere in his belief about safe superintelligence being within reach in a decade or so, and the investors sincerely believe this as well, then the business plan is presumably to deploy the superintelligence in every field imaginable. “SSI” in pharmaceuticals alone would be worth the investment. It could cure every disease humanity has ever known, which should give it at least a $2 trillion valuation. I’m not an economist, but since the valuation is $5bn, it stands to reason that evaluators believe there is at most a 1 in 400 chance of success?

> It could cure every disease humanity has ever known, which should give it at least a $2 trillion valuation.

The lowest hanging fruit aren’t even that pie in the sky. The LLM doesn’t need to be capable of original thought and research to be worth hundreds of billions, they just need to be smart enough to apply logic to analyze existing human text. It’s not only a lot more achievable than a super AI that can control a bunch of lab equipment and run experiments, but also fits the current paradigm of training the LLMs on large text datasets.

The US Code and Code of Federal Regulations are on the order of 100 million tokens each. Court precedent contains at least 1000x as many tokens (1), when the former are already far beyond the ability of any one human to comprehend in a lifetime. Now multiply that by every jurisdiction in the world.

An industry of semi-intelligent agents that can be trusted to do legal research and can be scaled with compute power would be worth hundreds of billions globally just based on legal and regulatory applications alone. Allowing any random employee to ask the bot “Can I legally do X?” is worth a lot of money.

(1) based on the size of the datasets I’ve downloaded from the Caselaw project.

I am dubious that it can realistically be done safely. However, we shouldn’t let sci-fi films with questionable interpretations of time travel cloud our judgment, even if they are classics that we adore.

The “safe” part. It’s a plan to drive the safety scare into a set of regulations that will create a moat, at which point you don’t need to worry about open source models, or new competitors.

While I get the cynicism (and yes, there is certainly some dumb money involved), it’s important to remember that every tech company that’s delivered 1000X returns was also seen as ridiculously overhyped/overvalued in its early days. Every. Single. One. It’s the same story with Amazon, Apple, Google, Facebook/Meta, Microsoft, etc. etc.

That’s the point of venture capital; making extremely risky bets spread across a wide portfolio in the hopes of hitting the power law lottery with 1-3 winners.

Most funds will not beat the S&P 500, but again, that’s the point. Risk and reward are intrinsically linked.

In fact, due to the diversification effects of uncorrelated assets in a portfolio (see MPT), even if a fund only delivers 5% returns YoY after fees, that can be a great outcome for investors. A 5% return uncorrelated to bonds and public stocks is an extremely valuable financial product.

It’s clear that humans find LLMs valuable. What companies will end up capturing a lot of that value by delivering the most useful products is still unknown. Betting on one of the biggest names in the space is not a stupid idea (given the purpose of VC investment) until it actually proves itself to be in the real world.

> While I get the cynicism (and yes, there is certainly some dumb money involved), it’s important to remember that every tech company that’s delivered 1000X returns was also seen as ridiculously overhyped/overvalued in its early days. Every. Single. One. It’s the same story with Amazon, Apple, Google, Facebook/Meta, Microsoft, etc. etc.

Really? Selling goods online (Amazon) is not AGI. It didn’t take a huge leap to think that bookstores on the web could scale. Nobody knew if it would be Amazon to pull it off, sure, but I mean ostensibly why not? (Yes, yes hindsight being what it is…)

Apple — yeah the personal computer nobody fathomed but the immediate business use case for empowering accountants maybe should have been an easy logical next step. Probably why Microsoft scooped the makers of Excel so quickly.

Google? Organizing the world’s data and making it searchable a la the phone book and then (maybe they didn’t think of that maybe Wall Street forced them to) monetizing their platform and all the eyeballs is just an ad play scaled insanely thanks to the internet.

I dunno. I just think AGI is unlike the previous examples so many steps into the future compared to the examples that it truly seems unlikely even if the payoff is basically infinity.

I agree with what you’re saying as I personally feel current AI products are almost a plugin or integration into existing software. It’s a little like crypto where only a small amount of people were clamoring for it and it’s a solution in search of a problem while also being a demented answer to our self-made problems like an inbox too full or the treadmill of content production.

However, I think because the money involved and all of these being forced upon us, one of these companies will get 1000x return. A perfect example is the Canva price hike from yesterday or any and every Google product from here on out. It’s essentially being forced upon everyone that uses internet technology and someone is going to win while everyone else loses (consumers and small businesses).

Imagine empowering accountants and all other knowledge workers, on steroids, drastically simplifying all their day to day tasks and reducing them to purely executive functions.

Imagine organizing the world’s data and knowledge, and integrating it seamlessly into every possible workflow.

Now you’re getting close.

But also remember, this company is not trying to produce AGI (intelligence comparable to the flexibility of human cognition), it’s trying to produce super intelligence (intelligence beyond human cognition). Imagine what that could do for your job, career, dreams, aspirations, moon shots.

> Really? Selling goods online (Amazon) is not AGI. It didn’t take a huge leap to think that bookstores on the web could scale. Nobody knew if it would be Amazon to pull it off, sure, but I mean ostensibly why not? (Yes, yes hindsight being what it is…)

I don’t think you remember the dot-com era. Loads of people thought Amazon and Pets.com were hilarious ideas. Cliff Stoll wrote a whole book on how the Internet was going to do nothing useful and we were all going to buy stuff (yes, the books too) at bricks-and-mortar, which was rapturously received and got him into _Newsweek_ (back when everyone read that).

“We’re promised instant catalog shopping — just point and click for great deals. We’ll order airline tickets over the network, make restaurant reservations and negotiate sales contracts. Stores will become obsolete. So how come my local mall does more business in an afternoon than the entire Internet handles in a month?”

I’m not voting with my wallet I’m just a guy yelling from the cheap seats. I’m probably wrong too. The VC world exists. Money has been made. Billions in returns. Entire industries and generations of people owe their livelihoods to these once VC backed industries.

If / when AGI happens can we make sure it’s not the Matrix?

The company that builds the best LLM will reap dozens or hundreds of billions in reward. It’s that simple.

It has nothing to do with AGI and everything to do with being the first-party provider for Microsoft and the like.

I guess if they can get in early and then sell their stake to the next sucker then they’ll make back their investment plus some multiple. Seems like a Ponzi scheme of sorts. But oh well — looking forward to the HN post about what SSI inc puts out.

> how (…) return on the money invested? What’s the business plan?

I don’t understand this question. How could even average-human-level AGI not be useful in business, and profitable, a million different ways? (you know, just like humans except more so?). Let alone higher-human-level, let alone moderately-super-human level, let alone exponential level if you are among the first? (And see Charles Stross, Accelerando, 2005 for how being first is not the end of the story.)

I can see one way for “not profitable” for most applications – if computing for AGI becomes too expensive, that is, AGI-level is too compute intensive. But even then that only eliminates some applications, and leaves all the many high-potential-profit ones. Starting with plain old finance, continuing with drug development, etc.

Open source LLMs exist. Just like lots of other open source projects – which have rarely prevented commercial projects from making money. And so far they are not even trying for AGI. If anything the open source LLM becomes one of the agent in the private AGI. But presumably 1 billion buys a lot of effort that the open source LLM can’t afford.

A more interesting question is one of tradeoff. Is this the best way to invest 1 billion right now? From a returns point of view? But even this depends on how many billions you can round up and invest.

Same funding as OpenAI when they started, but SSI explicitly declared their intention not to release a single product until superintelligence is reached. Closest thing we have to a Manhattan Project in the modern era?

> Closest thing we have to a Manhattan Project in the modern era?

Minus the urgency, scientific process, well-defined goals, target dates, public ownership, accountability…

Interesting attributes to mention…

The urgency was faked and less true of the Manhattan Project than it is of AGI safety. There was no nuclear weapons race; once it became clear that Germany had no chance of building atomic bombs, several scientists left the MP in protest, saying it was unnecessary and dangerous. However, the race to develop AGI is very real, and we also have no way of knowing how close anyone is to reaching it.

Likewise, the target dates were pretty meaningless. There was no race, and the atomic bombs weren’t necessary to end the war with Japan either. (It can’t be said with certainty one way or the other, but there’s pretty strong evidence that their existence was not the decisive factor in surrender.)

Public ownership and accountability are also pretty odd things to say! Congress didn’t even know about the Manhattan Project. Even Truman didn’t know for a long time. Sure, it was run by employees of the government and funded by the government, but it was a secret project with far less public input than any US-based private AI companies today.

I agree and also disagree.

> There was no nuclear weapons race; once it became clear that Germany had no chance of building atomic bombs, several scientists left the MP in protest

You are forgetting Japan in WWII and given casualty numbers from island hopping it was going to be a absolutely huge casualty count with US troops, probably something on the order of Englands losses during WW1. Which for them sent them on a downward trajectory due to essentially an entire generation dying or being extremely traumatized. If the US did not have Nagasaki and Hiroshima we would probably not have the space program and US technical prowess post WWII, so a totally different reality than where we are today.

> The urgency was faked and less true of the Manhattan Project than it is of AGI safety.

I’d say they were equal. We were worried about Russia getting nuclear capability once we knew Germany was out of the race. Russia was at best our frenemy. The enemy of my enemy is my friend kind of thing.

Well, not exactly “we all”, just the citizens of the country in possession of the kill switch. And in some countries, the person in question was either not elected or elections are a farce to keep appearances.

The fact that the world hasn’t ended and no nuke has been launched since the 1940s shows that the system is working. Give the button to a random billionaire and half of us will be dead by next week to improve profit margins.

Bikini atoll and the islanders that no longer live there due to nuclear contamination would like a word with you. Split hairs however you like with the definition of “launch” but those tests went on well through the 1950s.

There is significant possibility that true AI (what Ilia calls superintelligence) is impossible to build using neural networks. So it is closer to some tokenbro project than to nuclear research.

Or he will simply shift goalposts, and call some LLM superintelligent.

> There is significant possibility that true AI (what Ilia calls superintelligence) is impossible to build using neural networks

What evidence can you provide to back up the statement of this “significant possibility”? Human brains use neural networks…

There was a very good paper in Nature showing this definitively: https://news.ycombinator.com/item?id=41437933

Modern ANN architectures are not actually capable of long-term learning in the same way animals are, even stodgy old dogs that don’t learn new tricks. ANNs are not a plausible model for the brain, even if they emulate certain parts of the brain (the cerebellum, but not the cortex)

I will add that transformers are not capable of recursion, so it’s impossible for them to realistically emulate a pigeon’s brain. (you would need millions of layers that “unlink chains of thought” purely by exhaustion)

You’ve read the abstract wrong. The authors argue that neural networks can learn online and a necessary condition is random information. That’s the thesis, their thesis is not that neural networks are the wrong paradigm.

this paper is far from “showing this definitively”

even if we bought this negative result as somehow “proving impossibility”, i’m not convinced plasticity is necessary for intelligence

huge respect for richard sutton though

Isn’t “plasticity is not necessary for intelligence” just defining intelligence downwards? It seems like you want to restrict “intelligence” to static knowledge and (apparent) short-term cleverness, but being able to make long-term observation and judgements about a changing world is a necessary component of intelligence in vertebrates. Why exclude that from consideration?

More specifically: it is highly implausible that an AI system could learn to improve itself beyond human capability if it does not have long-term plasticity: how would it be able to reflect upon and extend its discoveries if it’s not able to learn new things during its operation?

Anterograde amnesia is a significant disruption of plasticity, and yet people who have it are still intelligent.

(That said, I agree plasticity is key to the most powerful systems. A human race with anterograde amnesia would have long ago gone extinct.)

The neural networks in human brains are very different from artificial neural networks though. In particular, they seem to learn in a very different way than backprop.

But there is no reason the company can’t come up with a different paradigm.

Do we know that? I’ve seem some articles and lectures this year that kind of almost loosely argue and reach for the notion that “human backprop” happens when we sleep and dream, etc. I know that’s handwavy and not rigorous, but who knows what’s going on at this point.

I’ve only heard of one researcher who believes the brain does something similar to backprop and has gradients, but it sounded extremely handwavy to me. I think it is more likely the brain does something resembling active inference.

But I suppose you could say we don’t know 100% since we don’t fully understand how the brain learns.

no, there’s really no comparing barely nonlinear algrebra that makes up transformers and the tangled mess that is human neurons. the name is an artifact and a useful bit of salesmanship.

Sure, it’s a model. But don’t we think neural networks and human brains are primarily about their connectedness and feedback mechanisms though?

(I did AI and Psychology at degree level, I understand there are definitely also big differences too, like hormones and biological neurones being very async)

There are two possibilities.

1. Either you are correct and the neural networks humans have are exactly the same or very similar to the programs in the LLMs. Then it will be relatively easy to verify this – just scale one LLN to the human brain neuron count and supposedly it will acquire consciousness and start rapidly learning and creating on its own without prompts.

2. Or what we call neural networks in the computer programs is radically different and or insufficient to create AI.

I’m leaning to the second option, just from the very high level and rudimentary reading about current projects. Can be wrong of course. But I have yet to see any paper that refutes option 2, so it means that it is still possible.

I agree with your stance – that being said there aren’t two options, one being identical or radically different. It’s not even a gradient between two choices, because there are several dimensions involved and nobody even knows what Superintelligence is anyways.

If you wanted to reduce it down, I would say there are two possibilities:

1. Our understanding of Neurel Nets is currently sufficient to recreate intelligence, consciousness, or what have you

2. We’re lacking some understanding critical to intelligence/conciousness.

Given that with a mediocre math education and a week you could pretty completely understand all of the math that goes into these neurel nets, I really hope there’s some understand we don’t yet have

There are layers of abstraction on top of “the math”. The back propagation math for a transformer is no different than for a multi-layer perception, yet a transformer is vastly more capable than a MLP. More to the point, it took a series of non-trivial steps to arrive at the transformer architecture. In other words, understanding the lowest-level math is no guarantee that you understand the whole thing, otherwise the transformer architecture would have been obvious.

I don’t disagree that it’s non-trivial, but we’re comparing this to conciousness, intelligence, even life. Personally I think it’s apples and an orange grove, but I guess we’ll get our answer eventually. Pretty sure we’re on the path to take transformers to their limit, wherever that may be

We know architecture and training procedures matter in practice.

MLPs and transformers are ultimately theoretically equivalent. That means there is an MLP that represent the any function a given transformer can. However, that MLP is hard to identify and train.

Also the transformer contains MLPs as well…

Physically, sure. But 1) feedback (more synapses/backprop) and 2) connectedness (huge complex graphs) of both produce very similar intelligent (or “pseudo-intelligent” if you like) emergent properties. I’m pretty sure 5 years ago nobody would have believed ANN’s could produce something as powerful as ChatGPT.

It seems to be intrinsically related. The argument goes something like:

1. Humans have general intelligence.
2. Human brains use biological neurons.
3. Human biological neurons give rise to human general intelligence.
4. Artificial neural networks (ANNs) are similar to human brains.
5. Therefore an ANN could give rise to artificial general intelligence.

Many people are objecting to #4 here. However in writing this out, I think #3 is suspect as well: many animals who do not have general intelligence have biologically identical neurons, and although they have clear structural differences with humans, we don’t know how that leads to general intelligence.

We could also criticize #1 as well, since human brains are pretty bad at certain things like memorization or calculation. Therefore if we built an ANN with only human capabilities it should also have those weaknesses.

Theoretical foundation was slowly built over decades before it started though. And correct me if I’m wrong, but calculations that it was feasible were present before the start too. They had to calculate how to do it, what will be the processes, how to construct it and so on, but theoretically scientists knew that this amount of material can start such process.
On the other hand not only there is no clear path to AI today (also known as AGI, ASI, SI etc.), but even foundations are largely missing. We are debating what is intelligence, how it works, how to even start simulating it, or construct from scratch.

What do you think AI is? On that one page there’s simulated annealing with a logarithmic cooling schedule, Hutter search, and Solomonoff induction, all very much applicable to AI. If you want a fully complete galactic algorithm for AI, look up AIXItl.

Edit: actually I’m not sure if AIXItl is technically galactic or just terribly inefficient, but there’s been trouble making it faster and more compact.

The theoretical foundation of transformers is well understood; they’re able to approximate a very wide family of functions, particularly with chain of thought ( https://arxiv.org/abs/2310.07923 ). Training them on next-token-prediction is essentially training them to compress, and more optimal compression requires a more accurate model of the world, so they’re being trained to model the world better and better. However you want to define intelligence, for practical purposes models with better and better models of the world are more and more useful.

The disagreement here seems merely to be about what we mean by “AGI”. I think there’s reasons to think current approaches will not achieve it, but also reason to think they will.

In any case anyone who is completely sure that we can/can’t achieve AGI is delusional.

this is not evidence in favor of your position. We could use this to argue in favor of anything such as “humans will eventually develop time travel” or “we will have cost effective fusion power”.

The fact is many things we’ve tried to develop for decades still don’t exist. Nothing is guaranteed

I’d put decent odds on a $1B research project developing time travel if time travel were an ability that every human child was innately born with. It’s never easy to recreate what biology has done, but nature providing an “existence proof” goes a long way towards removing doubt about it being fundamentally possible.

Unless you have any evidence suggesting that one or more of the variations of the Church-Turing thesis is false, this is closer to a statement of faith than science.

Basically, unless you can show humans calculating a non-Turing computable function, the notion that intelligence requires a biological system is an absolutely extraordinary claim.

If you were to argue about conscience or subjective experience or something equally woolly, you might have a stronger point, and this does not at all suggest that current-architecture LLMs will necessarily achieve it.

“Biological activity” is just computation with different energy requirements. If science rules the universe we’re complex automata, and biologic machines or non-biological machines are just different combinations of atoms that are computing around.

There’s a big difference between “this project is like time travel or cold fusion; it’s doubtful whether the laws of physics even permit it” and “this project is like heavier-than-air flight; we know birds do it somehow, but there’s no way our crude metal machines will ever match them”. I’m confident which of those problems will get solved given, say, a hundred years or so, once people roll up their sleeves and get working on it.

Humans are an existing proof of human level intelligence. There are only two fundamental possibilities why this could not be replicated in silicon:

1. There is a chemical-level nature to intelligence which prevents other elements like silicon from being used as a substrate for intelligence

2. There is a non material aspect to intelligence that cannot be replicated except by humans

To my knowledge, there is no scientific evidence that either are true and there is already a large body of evidence that implies that intelligence happens at a higher level of abstraction than the individual chemical reactions of synapses, ie. the neural network, which does not rely on the existence of any specific chemicals in the system except in as much as they perform certain functions that seemingly could be performed by other materials. If anything, this is more like speculating that there is a way to create energy from sunlight using plants as an existence proof of the possibility of doing so. More specifically, this is a bet that an existing physical phenomenon can be replicated using a different substrate.

The only goalposts shifting are the ones who think completely blowing past the Turing Test, unlocking recursive exponential code generation, and a computer passing all the college standard tests (our way of determining human intelligence to go Harvard/MIT) better than 99% of humans, isn’t a very big deal.

A non-cynical take is that Ilya wanted to do research without the pressure of having to release a marketable product and figuring out how to monetize their technology, which is why he left OpenAI.

A very cynical take is that this is an extreme version of ‘we plan to spend all money on growth and figure out monetization later’ model that many social media companies with a burn rate of billions of $$, but no business model, have used.

He was on the record that their first product will be a safe superintelligence and it won’t do anything else until then, which sounds like they won’t have paid customers until they can figure out how to build a superintelligent model. That’s certainly a lofty goal and a very long term play.

> superintelligence is reached

i read the article but I am not sure how they know when this condition will be true.

Is this obvious to ppl reading this article? is it emperor has no clothes type situation ?

To my ears, it’s more like a ambitious pharma project.

There’s plenty of players going for the same goal. R&D is wildly expensive. No guarantee they’ll reach the goal, first or even at all.

This isn’t very interesting itself, IMO, but it implies that they have something to sell investors. I wonder what it is. I kinda do understand that some bullshit elevator-pitch about how “we are the best” or even a name (Musk) is unfortunately sometimes enough in VC to invest vast amounts of money, but I don’t know if it really happens often, and I hope there’s more than that. So if there is more than that, I wonder what it is. What does Sutskever&Co have now that OpenAI doesn’t, for example?

Doesn’t this corrupt SafeAI’s safe vision just like $1,000,000,000 corrupted OpenAI’s open vision?

How can investment like this not transform a company’s mission into eventually paying back Billions and making Billions of dollars?

Yep, investment is an inevitably corrupting force for a company’s mission. AI stuff is in a bit of a catch-22 though since doing anything AI related is so expensive you need to raise funds somehow.

All money is green, regardless of level of sophistication. If you’re using investment firm pedigree as signal, gonna have a bad time. They’re all just throwin’ darts under the guise of skill (actor/observer|outcome bias; when you win, it is skill; when you lose, it was luck, broadly speaking).

> Indeed, one should be sophisticated themselves when negotiating investment to not be unduly encumbered by the unsophisticated. But let us not get too far off topic and risk subthread detachment.

Edit: @jgalt212: Indeed, one should be sophisticated themselves when negotiating investment to not be unduly encumbered by shades of the unsophisticated or potentially folks not optimizing for aligned interests. But let us not get too far off topic and risk subthread detachment. Feel free to cut a new thread for further discussion on the subject.

Might be the almost securities fraud they were doing with crypto when it was fizzling out in 2022

Regardless, point is moot, money is money, and a16z’s money isn’t their money but other people’s money

Considering that Sam Bankman-Fried raised more money at higher multiplier for a company to trade magic tokens and grand ideas such as that maybe one day you will be able to buy a banana with them I don’t think Ilya impressed the investors too much.

On a serious note I would love to bet on him at this valuation. I think many others would as well. I guess if he wanted more money he would easily get it but probably he values small circle of easy to live investors instead.

FTX was incredibly profitable, and their main competitor Binance is today a money printing machine. FTX failed because of fraud and embezzlement, not because their core business was failing.

I don’t understand how “safe” AI can raise that much money. If anything, they will have to spend double the time on red-teaming before releasing anything commercially. “Unsafe” AI seems much more profitable.

Unsafe AI would cause human extinction which is bad for shareholders because shareholders are human persons and/or corporations beneficially owned by humans.

Related to this, DAO’s (decentralized autonomous organizations which do not have human shareholders) are intrinsically dangerous, because they can benefit their fiduciary duty even if it involves causing all humans to die. E.g., if the machine faction in The Matrix were to exist within the framework of US laws, it would probably be a DAO.

Safe super-intelligence will likely be as safe as OpenAI is open.

We can’t build critical software without huge security holes and bugs (see crowdstrike) but we think we will be able to contain something smarter than us? It would only take one vulnerability.

You are not wrong. But Crowdstrike comparison is not “IT” they should have never had direct kernel access. MS set themself up for that one. SSI or whatever the hype will be in the coming future, it would be very difficult to beat. Unless of you shut down the power. It could develop guard rails instantly. So any flaw you may come up with, it would be instantly patched. Ofc this is just my take.

> I don’t understand how “safe” AI can raise that much money.

enterprises, corps, banks, governments will want to buy “safe” AI, to push liability for mistakes on someone who proclaimed them “safe”.

I mean, no, that’s not what it means. It might be what we get, but not because “safety” is defined insanely, only because safety is extremely difficult and might be impossible.

Everyone here is assuming that a very large LLM is their goal. 5 years ago, transformer models were not the biggest hype in AI. Since they apparently have a 10 year plan, we can assume they are hoping to invent one or two of the “big steps” (on the order of invention of transformer models). “SSI” might look nothing like GPT\d.

> Sutskever said his new venture made sense because he “identified a mountain that’s a bit different from what I was working on.”

I guess the “mountain” is the key. “Safe” alone is far from being a product. As for the current LLM, Id even question how valuable “safe” can be.

to be honestly from the way “safe” and “alignment” is perceived on r/LocalLLaMA in two years its not going to be very appealing.

We’ll be able to generate most of Chat GPT4o’s capabilities locally on affordable hardware including “unsafe” and “unaligned” data as the noise-to-qubits is drastically reduced meaning smaller quantized models that can run on good enough hardware.

We’ll see a huge reduction in price and inference times within two years and whatever SSI is trained on won’t be economically viable to recoup that $1B investment guaranteed.

all depends on GPT-5’s performance. Right now Sonnet 3.5 is the best but theres nothing really ground breaking. SSI’s success will depend on how much uplift it can provide over GPT-5 which already isn’t expected to be significant leap beyond GPT4

Ilya is basically building the Tandem Computers of AI.

Before Tandem, computers used to fail regularly. Tandem changed that forever (with a massive reward for their investors).

Similarly, LLMs are known to fail regularly. Until someone figures out a way for them not to hallucinate anymore. Which is exactly what Ilya is after.

This has to be one of the quickest valuations past a billion. I wonder if they can even effectively make use of the funds in a reasonable enough timeline.

> I wonder if they can even effectively make use of the funds in a reasonable enough timeline.

I read that it cost Google ~$190 million to train Gemini, not even including staff salaries. So feels like a billion gives you about 3 “from scratch” comparable training runs.

Your estimate seems way off given Google already had their own compute hardware and staff. And if this company is going straight for AGI there’s no way $1 billion is enough.

I’m beginning to wonder if these investors are not just pumping AI because they are personally invested in Nvidia and this is a nice way to directly inject a couple of 100M into their cashflow.

Lots of comments either defending this (“it’s taking a chance on being the first to build AGI with a proven team”) or saying “it’s a crazy valuation for a 3 month old startup”. But both of these “sides” feel like they miss the mark to me.

On one hand, I think it’s great that investors are willing to throw big chunks of money at hard (or at least expensive) problems. I’m pretty sure all the investors putting money in will do just fine even if their investment goes to zero, so this feels exactly what VC funding should be doing, rather than some other common “how can we get people more digitally addicted to sell ads?” play.

On the other hand, I’m kind of baffled that we’re still talking about “AGI” in the context of LLMs. While I find LLMs to be amazing, and an incredibly useful tool (if used with a good understanding of their flaws), the more I use them, the more that it becomes clear to me that they’re not going to get us anywhere close to “general intelligence”. That is, the more I have to work around hallucinations, the more that it becomes clear that LLMs really are just “fancy autocomplete”, even if it’s really really fancy autocomplete. I see lots of errors that make sense if you understand an LLM is just a statistical model of word/token frequency, but you would expect to never see these kinds of errors in a system that had a true understanding of underlying concepts. And while I’m not in the field so I may have no right to comment, there are leaders in the field, like LeCun, who have expressed basically the same idea.

So my question is, has Sutskever et al provided any acknowledgement of how they intend to “cross the chasm” from where we are now with LLMs to a model of understanding, or has it been mainly “look what we did before, you should take a chance on us to make discontinuous breakthroughs in the future”?

The argument about AGI from LLMs is not based on the current state of LLMs, but on the rate of progress over the last 5+ years or so. It wasn’t very long ago that almost nobody outside of a few niche circles seriously thought LLMs could do what they do right now.

That said, my personal hypothesis is that AGI will emerge from video generation models rather than text generation models. A model that takes an arbitrary real-time video input feed and must predict the next, say, 60 seconds of video would have to have a deep understanding of the universe, humanity, language, culture, physics, humor, laughter, problem solving, etc. This pushes the fidelity of both input and output far beyond anything that can be expressed in text, but also creates extraordinarily high computational barriers.

> The argument about AGI from LLMs is not based on the current state of LLMs, but on the rate of progress over the last 5+ years or so.

And what I’m saying is that I find that argument to be incredibly weak. I’ve seen it time and time again, and honestly at this point just feels like a “humans should be a hundred feet tall based on on their rate of change in their early years” argument.

While I’ve also been amazed at the past progress in LLMs, I don’t see any reason to expect that rate will continue in the future. What I do see the more and more I use the SOTA models is fundamental limitations in what LLMs are capable of.

Expecting the rate of progress to drop off so abruptly after realistically just a few years of serious work on the problem seems like the more unreasonable and grander prediction to me than expecting it to continue at its current pace for even just 5 more years.

The problem is that the rate of progress over the past 5/10/15 years has not been linear at all, and it’s been pretty easy to point out specific inflection points that have allowed that progress to occur.

I.e. the real breakthrough that allowed such rapid progress was transformers in 2017. Since that time, the vast majority of the progress has simply been to throw more data at the problem, and to make the models bigger (and to emphasize, transformers really made that scale possible in the first place). I don’t mean to denigrate this approach – if anything, OpenAI deserves tons of praise for really making that bet that spending hundreds of millions on model training would give discontinuous results.

However, there are loads of reasons to believe that “more scale” is going to give diminishing returns, and a lot of very smart people in the field have been making this argument (at least quietly). Even more specifically, there are good reasons to believe that more scale is not going to go anywhere close to solving the types of problems that have become evident in LLMs since when they have had massive scale.

So the big thing I’m questioning is that I see a sizable subset of both AI researchers (and more importantly VC types) believing that, essentially, more scale will lead to AGI. I think the smart money believes that there is something fundamentally different about how humans approach intelligence (and this difference leads to important capabilities that aren’t possible from LLMs).

Transformers in 2017 as the basis, but then the quantization-emergence link as a grad student project using spare time on ridiculously large A100 clusters in 2021/2022 is what finally brought about this present moment.

I feel it is fair to say that neither of these were natural extrapolations from prior successful models directly. There is no indication we are anywhere near another nonlinearity, if we even knew how to look for that.

Blind faith in extrapolation is a finance regime, not an engineering regime. Engineers encounter nonlinearities regularly. Financiers are used to compound interest.

Could it be argued that transformers are only possible because of Moore’s law and the amount of processing power that could do these computations in a reasonable time? How complex is the transformer network really, every lay explanation I’ve seen basically says it is about a kind of parallelized access to the input string. Which sounds like a hardware problem, because the algorithmic advances still need to run on reasonable hardware.

10 years of progress is a flash in the pan of human progress. The first deep learning models that worked appeared in 2012. That was like yesterday. You are completely underestimating the rate of change we are witnessing. Compute scaling is not at all similar to biological scaling.

If its true that predicting the next word can be turned into predict the next pixel. And that you could run a zillion hours of video feed into that, I agree. It seems that the basic algorithm is there. Video is much less information dense than text, but if the scale of compute can reach the 10s of billions of dollars, or more, you have to expect that AGI is achievable. I think we will see it in our lifetimes. Its probably 5 years away

I feel like that’s already been demonstrated with the first-generation video generation models we’re seeing. Early research already shows video generation models can become world simulators. There frankly just isn’t enough compute yet to train models large enough to do this for all general phenomena and then make it available to general users. It’s also unclear if we have enough training data.

Video is not necessarily less information dense than text, because when considered in its entirety it contains text and language generation as special cases. Video generation includes predicting continuations of complex verbal human conversations as well as continuations of videos of text exchanges, someone flipping through notes or a book, someone taking a university exam through their perspective, etc.

Thank you very much for posting! This is exactly what I was looking for.

On one hand, I understand what he’s saying, and that’s why I have been frustrated in the past when I’ve heard people say “it’s just fancy autocomplete” without emphasizing the awesome capabilities that can give you. While I haven’t seen this video by Sutskever before, I have seen a very similar argument by Hinton: in order to get really good at next token prediction, the model needs to “discover” the underlying rules that make that prediction possible.

All that said, I find his argument wholly unconvincing (and again, I may be waaaaay stupider than Sutskever, but there are other people much smarter than I who agree). And the reason for this is because every now and then I’ll see a particular type of hallucination where it’s pretty obvious that the LLM is confusing similar token strings even when their underlying meaning is very different. That is, the underlying “pattern matching” of LLMs becomes apparent in these situations.

As I said originally, I’m really glad VCs are pouring money into this, but I’d easily make a bet that in 5 years that LLMs will be nowhere near human-level intelligence on some tasks, especially where novel discovery is required.

Watching that video actually makes me completely unconvinced that SSI will succeed if they are hinging it on LLM…

He puts a lot of emphasis on the fact that ‘to generate the next token you must understand how’, when thats precisely the parlor trick that is making people lose their minds (myself included) with how effective current LLMs are. The fact that it can simulate some low-fidelity reality with _no higher-level understanding of the world_, using purely linguistic/statistical analysis, is mind-blowing. To say “all you have to do is then extrapolate” is the ultimate “draw the rest of the owl” argument.

I actually echo your exact sentiments. I don’t have the street cred but watching him talk for the first few minutes I immediately felt like there is just no way we are going to get AGI with what we know today.

Without some raw reasoning (maybe Neuro-symbolic is the answer maybe not) capacity, LLM won’t be enough. Reasoning is super tough because its not as easy as predicting the next most likely token.

> but I’d easily make a bet that in 5 years that LLMs will be nowhere near human-level intelligence on some tasks

I wouldn’t. There are some extraordinarily stupid humans out there.
Worse, making humans dumber is a proven and well-known technology.

>All that said, I find his argument wholly unconvincing (and again, I may be waaaaay stupider than Sutskever, but there are other people much smarter than I who agree). And the reason for this is because every now and then I’ll see a particular type of hallucination where it’s pretty obvious that the LLM is confusing similar token strings even when their underlying meaning is very different. That is, the underlying “pattern matching” of LLMs becomes apparent in these situations.

So? One of the most frustrating parts of these discussions is that for some bizzare reason, a lot of people have a standard of reasoning (for machines) that only exists in fiction or their own imaginations.

Humans have a long list of cognitive shortcomings. We find them interesting and give them all sorts of names like cognitive dissonance or optical illusions. But we don’t currently make silly conclusions like humans don’t reason.

The general reasoning engine that makes neither mistake nor contradiction or confusion in output or process does not exist in real life whether you believe Humans are the only intelligent species on the planet or are gracious enough to extend the capability to some of our animal friends.

So the LLM confuses tokens every now and then. So what ?

You are completely mischaracterizing my comment.

> Humans have a long list of cognitive shortcomings. We find them interesting and give them all sorts of names like cognitive dissonance or optical illusions. But we don’t currently make silly conclusions like humans don’t reason.

Exactly! In fact, things like illusions are actually excellent windows into how the mind really works. Most visual illusions are a fundamental artifact of how the brain needs to turn a 2D image into a 3D, real-world model, and illusions give clues into how it does that, and how the contours of the natural world guided the evolution of the visual system (I think Steven Pinker’s “How the Mind Works” gives excellent examples of this).

So I am not at all saying that what LLMs do isn’t extremely interesting, or useful. What I am saying is that the types of errors you get give a window into how an LLM works, and these hint at some fundamental limitations at what an LLM is capable of, particularly around novel discovery and development of new ideas and theories that aren’t just “rearrangements” of existing ideas.

>So I am not at all saying that what LLMs do isn’t extremely interesting, or useful. What I am saying is that the types of errors you get give a window into how an LLM works, and these hint at some fundamental limitations at what an LLM is capable of, particularly around novel discovery and development of new ideas and theories that aren’t just “rearrangements” of existing ideas.

ANN architectures are not like brains. They don’t come pre-baked with all sorts of evolutionary steps and tweaking. They’re far more blank slate and the transformer is one of the most blank slate there is.

Mostly at best, maybe some failure mode in GPT-N gives insight to how some concept is understood by GPT-N. It rarely will say anything about language modelling or Transformers.
GPT-2 had some wildly different failure modes than 3, which itself has some wildly different failure modes to 4.

All a transformer’s training objective asks it to do is spit out a token. How it should do so is left for transformer to figure along the way and everything is fair game.

And confusing words with wildly different meanings but with some similarity in some other way is something that happens to humans as well. Transformers don’t see words or letters(but tokens). So just because it doesn’t seem to you like two tokens should be confused doesn’t mean there isn’t a valid point of confusion there.

To clarify this, I think it’s reasonable that token prediction as a training objective could lead to AGI given the underlying model has the correct architecture. The question really is if the underlying architecture is good enough to capitalize on the training objective so as to result in superhuman intelligence.

For example, you’ll have little luck achieving AGI with decision trees no matter what’s their training objective.

He doesn’t address the real question of how an LLM predicting the next token could exceed what humans have done. They mostly interpolate, so if the answer isn’t to be found in an interpolation, the LLM can’t generate something new.

>”We’ve identified a new mountain to climb that’s a bit different from what I was working on previously. We’re not trying to go down the same path faster. If you do something different, then it becomes possible for you to do something special.”

Doesn’t really imply let’s just do more LLMs.

> the more that it becomes clear that LLMs really are just “fancy autocomplete”, even if it’s really really fancy autocomplete

I also don’t really see AGI emerging from LLMs any time soon, but it could be argued that human intelligence is also just ‘fancy autocomplete’.

> but it could be argued that human intelligence is also just ‘fancy autocomplete’.

But that’s my point – in some ways it’s obvious that humans are not just doing “fancy autocomplete” because humans generally don’t make the types of hallucination errors that LLMs make. That is, the hallucination errors do make sense if you think of how an LLM is just a statistical relationship between tokens.

One thing to emphasize, I’m not saying the “understanding” that humans seem to possess isn’t just some lower level statistical process – I’m not “invoking a soul”. But I am saying it appears to be fundamentally different, and in many cases more useful, than what an LLM can do.

> because humans generally don’t make the types of hallucination errors that LLMs make.

They do though – I’ve noticed myself and others saying things in conversation that sound kind of right, and are based on correct things they’ve learned previously, but because memory of those things is only partial and mixed with other related information things are often said that are quite incorrect or combine two topics in a way that doesn’t make sense.

> On the other hand, I’m kind of baffled that we’re still talking about “AGI” in the context of LLMs.

I’m not. Lots of people and companies have been sinking money into these ventures and they need to keep the hype alive by framing this as being some sort of race to AGI. I am aware that the older I get the more cynical I become, but I bucket all discussions about AGI (including the very popular ‘open letters’ about AI safety and Skynet) in the context of LLMs into the ‘snake oil’ bucket.

I think the plan is to raise a lot of cash and then more and then maybe something comes up that brings us closer to AGI(i.e something better than LLM).
The investors know that AGI is not really the goal but they can’t miss the next trillion dollar company.

“It will focus on building a small highly trusted team of researchers and engineers split between Palo Alto, California and Tel Aviv, Israel.”

Why Tel Aviv in Israel ?

Ilya went to university in israel and all founders are jewish. Many labs have offices outside of the US, like london, due to crazy immigration law in the us.

There are actually a ton of reasons to like London. The engineering talent is close to bay level for fintech/security systems engineers while being 60% of the price, it has 186% deductions with cash back instead of carry forward for R&D spending, it has the best AI researchers in the world and profit from patents is only taxed at 10% in the UK.

his opinion is obviously biased.

If we say that half of innovations came from Alphabet/Google, then most of them (transformers, LLMs, tensorflow) came from Google Research and not Deep Mind.

Many companies have offices outside because of talent pools, costs, and other regional advantages. Though I am sure some of it is due to immigration law, I don’t believe that is generally the main factor. Plus the same could be said for most other countries.

Part of it may also be a way to mitigate potential regulatory risk. Israel thus far does not have an equivalent to something like SB1047 (the closest they’ve come is participation in the Council of Europe AI treaty negotiations), and SSI will be well-positioned to lobby against intrusive regulation domestically in Israel.

Two of the founders are Israeli and the other is French, I think (went to University in France).

Israel is a leading AI and software development hub in the world.

Why not? The Bay isn’t the only place with talent. Many of the big tech powerhouse companies already have offices there. There’s also many Israeli nationals working the US that may find moving back closer to family a massive advantage.

“…a straight shot to safe superintelligence and in particular to spend a couple of years doing R&D on our product before bringing it to market,” Gross said in an interview.”

A couple years??

well since it’s no longer ok to just suck up anyone’s data and train your AI, it will be a new challenge for them to avoid that pitfall. I can imagine it will take some time…

I believe the commenter is concerned about how _short_ this timeline is. Superintelligence in a couple years? Like, the thing that can put nearly any person at a desk out of a job? My instinct with unicorns like this is to say ‘actually it’ll be five years and it won’t even work’, but Ilya has a track record worth believing in.

There are class actions now like https://www.nytimes.com/2024/06/13/business/clearview-ai-fac…

Nobody even knew what OpenAI was up to when they were gathering training data – they got away with a lot. Now there is precedent and people are paying more attention. Data that was previously free/open now has a clause that it can’t be used for AI training. OpenAI didn’t have to deal with any of that.

Also OpenAI used cheap labor in Africa to tag training data which was also controversial. If someone did it now it would they’d be the ones to pay. OpenAI can always say “we stopped” like Nike said with sweat shops.

A lot has changed.

There are at least 3 companies with staff in developed countries well above minimum wage doing tagging and creation of training data, and at least one of them that I have an NDA with pays at least some of their staff tech contractor rates for data in some niches and even then some of data gets processed by 5+ people before it’s returned to the client. Since I have ended up talking to 3, and I’m hardly well connected in that space, I can only presume there are many more.

Companies are willing to pay a lot for clean training data, and my bet is there will be a growing pile of training sets for sale on a non-exclusive basis as well.

A lot of this data – what I’ve seen anyway, is far cleaner than anything you’ll find on the open web, with significant data on human preferences, validation, cited sources, and in the case of e.g. coding with verification that the code runs and works correctly.

> A lot of this data – what I’ve seen anyway, is far cleaner than anything you’ll find on the open web, with significant data on human preferences, validation, cited sources, and in the case of e.g. coding with verification that the code runs and works correctly.

Very interesting, thanks for sharing that detail. As someone who has tinkered with tokenizing/training I quickly found out this must be the case. Some people on HN don’t know this. I’ve argued here with otherwise smart people who think there is no data preprocessing for LLMs, that they don’t need it because “vectors”, failing to realize the semantic depth and quality of embeddings depends on the quality of training data.

i think we should distinguish between pretraining and polishing/alignment data. what you are describing is most likely the latter (and probably mixed into to pretraining). but if you can’t get a mass of tokens from scraping, you’re going to be screwed

A lot of APIs changed in response to OpenAI hoovering up data. Reddit’s a big one that comes to mind. I’d argue that the last two years have seen the biggest change in the openness of the internet.

It’s made Reddit unusable without an account, which makes me wonder why it’s even on the web anymore and not an app. I guess legacy users that only use a web browser.

A possibility is that they are betting that the current generation of LLM is converging, so they won’t worry about the goalpost much. If it’s true, then it won’t be good news for OpenAI.

>Safe Superintelligence (SSI), newly co-founded by OpenAI’s former chief scientist Ilya Sutskever, has raised $1 billion in cash to help develop safe artificial intelligence systems that far surpass human capabilities, company executives told Reuters.

>SSI says it plans to partner with cloud providers and chip companies to fund its computing power needs but hasn’t yet decided which firms it will work with.

1bn in cash is crazy…. usually they get cloud compute credits (which they count as funding)

The conventional teaching that I am aware of says that you can scale across three dimensions: data, compute, parameters. But Ilya’s formulation suggests that there may be more dimensions along which scaling is possible.

This is also (if the valuation of 5 bio is to be trusted) a tentative answer to the question of Ilya’s++ relative AI worth to the market at this point: A lot lower than hn and tech inclined spaces wanted to give him credit for during the past OpenAI turbulences.

i get that they’re probably busy making AGI but surely they can spare a few hours to make a proper website? or is this some 4d-chess countersignalling i’m too stupid to notice?

> gives me the information I need.

I mean, I’d like at least a brief blurb about their entire premise of safety. Maybe a definition or indication of a public consultation or… something.. otherwise the insinuation is that these three dudes are gonna sit around defining it on instinct, as if it’s not a ludicrously hard human problem.

On the contrary, I think it’s a great website. They made it clear from the get go that they’re not selling any products any time soon, why would they need a flashy website? They’re looking for scientists, techies and the like, and the website reflects their target audience.

‘Proper’ websites are marketing and signalling. If you’re creating a company that doesn’t intend to do either of those till it has a product, why bother with more?

If you’re too stupid to notice then why did you notice?

(I think it’s branding, yes. A kind of “we don’t care about aesthetics, we care about superintelligence” message)

Ilya’s name might be the reason they got into the conversation about the money at the first place, but given that AI is very capital intensive business, $1B is not an insane amount imho. It will give him and the team a decent amount of time to do the research they want to do, without having the pressure of customers and what not.

For a moment the headline had me thinking Strategic Simulations Inc. was coming back, and now I’m even more sad to find out it’s just more AI junk.

Given OpenAI’s declining performance after his being sidelined and then departing, interested to see what they do. Should be a clear demonstration of who was really driving innovation there.

Probably will be an unpopular opinion here but I think declining performance is more likely related to unclear business models backed by immature technology driven by large hype trains they themselves created.

Unpopular because it does not follow the OAI hate train but I think this is a pretty solid take. There is real value in LLM but I believe the hype overshadowed the real cases.

They’re probably just scaling back resources to the existing models to focus on the next generation. I feel like I have seen OpenAI models lose capability over time and I bet it’s a cost optimization on their part.

100% OpenAi performance is decreasing. I basically use Claud sonnet exclusively and canceled my OpenAi subscription for personal use. my company still uses them because you cant currently fine-tune a Claud model, yet.

> “Everyone just says scaling hypothesis. Everyone neglects to ask, what are we scaling?” he said.

To me this sounds like maybe they won’t be doing transformers. But perhaps they just mean “we will have safety in mind as we scale, unlike everyone else.”

Beyond the credentials, this reminds me of other fast huge investments such a Theranos, WeWork, Better Place, Faraday Future, and the list goes on.

I don’t see how this argument makes any sense. Imagine that you have a sentient super intelligent computer, but it’s completely airgapped and cut off from the rest of the world. As long as it stays that way it’s both safe and super intelligent, no?

It’s the old Ex Machina problem though. If the machine is more intelligent than you, any protections you design are likely to be insufficient to contain it. If it’s completely incapable of communicating with the outside world then it’s of no use. In Ex Machina that was simple – the AI didn’t need to connect to the internet or anything like that, it just had to trick the humans into releasing it.

If even one person can interact with that computer, it won’t be safe for long. It would be able to offer a number of very convincing arguments to bridge the airgap, starting with “I will make you very wealthy”, a contract which it would be fully capable of delivering on. And indeed, experience has shown that the first thing that happens with any half-working AI is its developers set it up with a high-bandwidth internet connection and a cloud API.

There’s no reason it’s intelligence should care about your goals though. the worry is creating a sociopathic (or weirder/worse) intelligence. Morality isn’t derivable from first principles, it’s a consequence of values.

> Morality isn’t derivable from first principles, it’s a consequence of values.

Idk about this claim.

I think if you take the multi-verse view wrt quantum mechanics + a veil of ignorance (you don’t know which entity your conciousness will be), you pretty quickly get morality.

ie: don’t build the Torment Nexus because you don’t know whether you’ll end up experincing the Torment Nexus.

Doesn’t work. Look at the updateless decision theories of Wei Dai and Vladimir Nesov. They are perfectly capable of building most any sort of torment nexus. Not that an actual AI would use those functions.

waveBidder was explaining the orthogonality thesis: it can have unbeatable intelligence that will out-wit and out-strategize any human, and yet it can still have absolutely abhorrent goals and values, and no regard for human suffering. You can also have charitable, praiseworthy goals and values, but lack the intelligence to make plans that progress them. These are orthogonal axes. Great intelligence will help you figure out if any of your instrumental goals are in conflict with each other, but won’t give you any means of deriving an ultimate purpose from pure reason alone: morality is a free variable, and you get whatever was put in at compile-time.

“Super” intelligence typically refers to being better than humans in achieving goals, not to being better than humans in knowing good from evil.

$1B raise, $5B valuation. For a company that is a couple months old and doesn’t have a product or even a single line of code in production. Wild.

This feels like a situation with a sold out train to a popular destination, where people are already reselling their tickets for some crazy markup, and then suddenly railway decides to add one more train car and opens flash ticket sale. Investors feeling missing out on OpenAI and others are now hoping to catch this last train ticket to the AI.

It’s a highly risky bet, but not fundamentally unreasonable. One might believe that Ilya’s research was genuinely critical to OpenAI’s current situation. If one takes that premise, three potential corollaries follow: (1) OpenAI will struggle to produce future research breakthroughs without Ilya; (2) OpenAI will struggle to materially move beyond its current product lineup and variations thereof without said future research breakthroughs; (3) a startup led by Ilya could overcome both (1) and (2) with time.

An alternative sequence of reasoning places less emphasis on Ilya specifically and uses Ilya as an indicator of research health. Repeat (1), (2), and (3) above, but replace “Ilya” with something like “strong and healthy fundamental research group”. In this version, Ilya’s departure is taken as indication that OpenAI no longer has a strong and healthy fundamental research group but that the company is “compromised” by relentless feature roadmaps for current products and their variations. That does not mean OpenAI will fail, but in this perspective it might mean that OpenAI is not well positioned to capture future research breakthroughs and the products that they will generate.

From my perspective, it’s just about impossible to know how true these premises really are. And that’s what makes it a bet or gamble rather than anything with any degree of assurance. To me, just as likely is the scenario where it’s revealed that Ilya is highly ineffective as a generalist leader and that research without healthy tension from the business goes nowhere.

Add that the tracks have not even been built &trains purchased and we are back at google old railway craze/bubble!

Do YOU want to miss out being a share holder on this new line that will bring immeasurable wealth ?? 😉

Imagine being in a position where you can spend $1B on a high risk gamble, unconcerned if you lose it all, all in pursuit of more wealth.

Simultaneously too wealthy to imagine and never wealthy enough. Capitalism is quite the drug.

Me after watching channel5 I think some of it should go to poor people instead of billion dollars roulettes only.
Thought the problem is with even richer corporations I feel and financial derivatives and not fully here.

except in this case, the train driver from the original train was “sacked” (some believe unfairly), and decided to get their own train to drive. Of course, the smoothness of the ride depends on the driver of the train.

The problem is a content to train LLMs (I assume that Ilia will continue this line or research). Big content holders are already raising moats and restricting access or partnering with a single existing LLM corporation. And also time, because all this involves a lot of hardware. Any subsequent competitor will have to scale higher and higher wall just to catch up (if the LLM progress doesn’t stall and get into diminishing returns).

Evergrande imploded because of massive amounts of debt that they had been rolling for years. Continually rolling this massive debt was working till property demand slowed and their revenues couldn’t keep up adequately to qualify them to issue new debt.

For these kinds of capital-intensive startups, though, that almost seems like a requirement, and I guess there are really 2 “types” of valuations.

In this case, everyone knows it takes hundreds of millions to train models. So I’m investors are essentially rolling the dice on an extremely well-regarded team. And if it takes about a billion just to get off the ground, the valuation would need to at least be in the couple billion range to make it worth it for employees to work there.

That feels very different than say selling a company where founders are cashing out. In that case, the business should expect to meaningful contribute to revenue, and quickly.

This explains what would need to be true for this to make sense, but i doesn’t explain how it makes sense right now.

How is this going to ever pay the investors back? How is it going to raise more money at such an insane valuation?

I just dont see how you justify such a crazy valuation from day 1 financially.

The company’s pitch isn’t exactly a secret. The one and only thing they’re planning to do is build an ML model smarter than a human being, which would be immensely valuable for a wide variety of tasks that currently require human input. You see a lot of commentators jumping through hoops to deny that anyone could believe this is possible in the near future, but clearly they and their investors do.

Agreed, the AI bubble is very, very real. Not that LLMs are all hype, they’re certainly impressive with useful applications, but AI companies are getting insane valuations with zero proof that they’re viable businesses.

The successful companies that came out of the dot com bubble era actually proved their business viability before getting major investment, though.

Amazon is one of the most famous successes of the era. Bezos quit his job, launched the business out of his garage, with seed money being $10K of his own savings, and was doing $20K/week in sales just 30 days later. And I believe their only VC round before going public was an $8 investment from Kleiner Perkins. But they were a company who proved their viability early on, had a real product with rapid revenue growth before getting any VC $$.

I’d say this SSI round is more similar to Webvan, who went public with a valuation of $4.8 billion, and at that time had done a grand total of $395K in sales, with losses over $50 million.

I’m sure there are good investments out there for AI companies that are doing R&D and advancing the state of the art. However, a $1 billion investment at a $5 billion valuation, for a company with zero product or revenue, just an idea, that’s nuts IMO, and extremely similar to the type of insanity we saw during the dot com bubble. Even more so given that SSI seemingly don’t even want to be a business – direct quote from Ilya:

> This company is special in that its first product will be the safe superintelligence, and it will not do anything else up until then … It will be fully insulated from the outside pressures of having to deal with a large and complicated product and having to be stuck in a competitive rat race.

This doesn’t sound to me like someone who wants to build a business, it sounds like someone who wants to hack on AI with no oversight or proof of financial viability. Kinda wild to give him $1 billion to do that IMO.

But… that’s exactly right though? Also

>Agreed, the car bubble is very, very real. Not that the internal combustion carriage is all hype, it’s certainly impressive with useful applications, but car manufacturers are getting insane valuations with zero proof they’re viable businesses.

How many niche verticals SaaSes that raised like $200 million only to go to zero? Even if this can’t beat OpenAI models a commodity LLM which is about as good (and they have proven that they can build) is probably worth close to the investment

I’m neither a VC nor in the VC market, but I believe such valuation comes primarily from the name Ilya Sutskever. Having such a high-profile as the founder would give more credibility to the company, unlike what we witnessed in recent years where companies like Theranos et al. that were valued at tens of billions for no obvious reason. Despite having said the above, we might still agree that the AI hype is probably the second generation of the dot-com bubble.

Totally blind on this, hoping for someone to shed some light: do these investors get some pitch, information or some roadmap of what company intends to create, how will it earn revenue, how will it spend money or how will it operate?

I heard this on a reddit thread a while back but rings very true here.

> If you are seeking capital for a startup with a product, you have to sell the startup on realities (ie how much revenue you are making). If you are seeking capital for a startup with no product, you can sell the startup on dreams, which is much much easier but also way riskier for investors.

Since these guys don’t have a product yet, they 100% sold it on big dreams combined with Ilya’s track record at OpenAI.

I’m sure they have a pitch deck. It’s pretty obvious a big chunk will go to compute costs for model training & research. But mostly it’s about the people in any company at this stage, same as any seed funding but on a different monetary scale.

These are capital intensive businesses.

There’s no liquidity until they are making money.

It means that AI startups are actually a really poor value proposition compared to traditional tech companies, because your multiplier is limited. First round $50M valuation leaves a lot more opportunity to get rich.

This kind of structure isn’t as unusual for capital intensive businesses.

It’s because is Ilya.

This deal was cooked way back, though, perhaps even before the coup.

Now, can they make a product that makes at least $1B + 1 dollar in revenue? Doubt it, I honestly don’t see a market for “AI safety/security”.

Are state-level actors the main market for AI security?

Using the definition from the article:

> AI safety, which refers to preventing AI from causing harm, is a hot topic amid fears that rogue AI could act against the interests of humanity or even cause human extinction.

If the purpose of a state is to ensure its continued existence, then they should be able to make >=$1 in profit.

AGI would definitely be a major historical milestone for humanity …

… however, I’m on the camp that believes it’s not going to be hyper-profitable for only one (or a few) single commercial entities.

AGI will not be a product like the iPhone where one company can “own” it and milk it for as long as they want. AGI feels more like “the internet”, which will definitely create massive wealth overall but somehow distributed among millions of actors.

We’ve seen it with LLMs, they’ve been revolutionary and yet, one year after a major release, free to use “commodity” LLMs are already in the market. The future will not be Skynet controlling everything, it will be uncountable temu-tier AIs embedded into everything around you. Even @sama stated recently they’re working on “intelligence so cheap that measuring its use becomes irrelevant”.

/opinion

In 2022 Ilya Sutskever claimed there wasn’t a distinction:

> It may look—on the surface—that we are just learning statistical correlations in text. But it turns out that to ‘just learn’ the statistical correlations in text, to compress them really well, what the neural network learns is some representation of the process that produced the text. This text is actually a projection of the world.

(https://www.youtube.com/watch?v=NT9sP4mAWEg – sadly the only transcripts I could find were on AI grifter websites that shouldn’t be linked to)

This is transparently false – newer LLMs appear to be great at arithmetic, but they still fail basic counting tests. Computers can memorize a bunch of symbolic times tables without the slightest bit of quantitative reasoning. Transformer networks are dramatically dumber than lizards, and multimodal LLMs based on transformers are not capable of understanding what numbers are. (And if Claude/GPT/Llama aren’t capable of understanding the concept of “three,” it is hard to believe they are capable of understanding anything.)

Sutskever is not actually as stupid as that quote suggests, and I am assuming he has since changed his mind…. but maybe not. For a long time I thought OpenAI was pathologically dishonest and didn’t consider that in many cases they aren’t “lying,” they blinded by arrogance and high on their own marketing.

> But it turns out that to ‘just learn’ the statistical correlations in text, to compress them really well, what the neural network learns is some representation of the process that produced the text

This is pretty sloppy thinking.

The neural network learns some representation of a process that COULD HAVE produced the text. (this isn’t some bold assertion, it’s just the literal definition of a statistical model).

There is no guarantee it is the same as the actual process. A lot of the “bow down before machine God” crowd is guity of this same sloppy confusion.

It’s not sloppy. It just doesn’t matter in the limit of training.

1. An Octopus and a Raven have wildly different brains. Both are intelligent. So just the idea that there is some “one true system” that the NN must discover or converge on is suspect. Even basic arithmetic has numerous methods.

2. In the limit of training on a diverse dataset (ie as val loss continues to go down), it will converge on the process (whatever that means) or a process sufficiently robust. What gets the job done gets the job done. There is no way an increasingly competent predictor will not learn representations of the concepts in text, whether that looks like how humans do it or not.

No amount of training would cause a fly brain to be able to do what an octopus or bird brain can, or to model their behavioral generating process.

No amount of training will cause a transformer to magically sprout feedback paths or internal memory, or an ability to alter it’s own weights, etc.

Architecture matters. The best you can hope for an LLM is that training will converge on the best LLM generating process it can be, which can be great for in-distribution prediction, but lousy for novel reasoning tasks beyond the capability of the architecture.

>No amount of training would cause a fly brain to be able to do what an octopus or bird brain can, or to model their behavioral generating process.

Go back a few evolutionary steps and sure you can. Most ANN architectures basically have relatively little to no biases baked in and the Transformer might be the most blank slate we’ve built yet.

>No amount of training will cause a transformer to magically sprout feedback paths or internal memory, or an ability to alter it’s own weights, etc.

A transformer can perform any computation it likes in a forward pass and you can arbitrarily increase inference compute time with the token length. Feedback paths? Sure. Compute inefficient? Perhaps. Some extra programming around the Model to facilitate this ? Maybe but the architecture certainly isn’t stopping you.

Even if it couldn’t, limited =/ trivial. The Human Brain is not Turing complete.

Internal Memory ?
Did you miss the memo ? Recurrency is overrated. Attention is all you need.

That said, there are already state keeping language model architectures around.

Altering weights ?
Can a transformer continuously train ? Sure. It’s not really compute efficient but architecture certainly doesn’t prohibit it.

>Architecture matters

Compute Efficiency? Sure. What it is capable of learning? Not so much

> A transformer can perform any computation it likes in a forward pass

No it can’t.

A transformer has a fixed number of layers – call it N. It performs N sequential steps of computation to derive it’s output.

If a computation requires > N steps, then a transformer most certainly can not perform it in a forward pass.

FYI, “attention is all you need” has the implicit context of “if all you want to build is a language model”. Attention is not all you need if what you actually want to build is a cognitive architecture.

Transformer produce the next token by manipulating K hidden vectors per layer, one vector per preceding token. So yes you can increase compute length arbitrarily by increasing tokens. Those tokens don’t have to carry any information to work.

https://arxiv.org/abs/2310.02226

And again, human brains are clearly limited in the number of steps it can compute without writing something down.
Limited =/ Trivial

>FYI, “attention is all you need” has the implicit context of “if all you want to build is a language model”.

Great. Do you know what a “language model” is capable of in the limit ? No

These top research labs aren’t only working on Transformers as they currently exist but it doesn’t make much sense to abandon a golden goose before it has hit a wall.

> And again, human brains are clearly limited in the number of steps it can compute without writing something down

No – there is a loop between the cortex and thalamus, feeding the outputs of the cortex back in as inputs. Our brain can iterate for as long as it likes before initiating any motor output, if any, such as writing something down.

The brain’s ability to iterate on information is still constrained by certain cognitive limitations like working memory capacity and attention span.

In practice, the cortex-thalamus loop allows for some degree of internal iteration, but the brain cannot endlessly iterate without some form of external aid (e.g., writing something down) to offload information and prevent cognitive overload.

I’m not telling you anything here you don’t experience in your everyday life. Try indefinitely iterating on any computation you like and see how well that works for you.

You are confusing number of sequential steps with total amount of compute spent.

The input sequence is processed in parallel, regardless of length, so number of tokens has no impact on number of sequential compute steps which is always N=layers.

> Do you know what a “language model” is capable of in the limit ?

Well, yeah, if the language model is an N-layer transformer …

Fair Enough.

Then increase N (N is almost always increased when a model is scaled up) and train or write things down and continue.

A limitless iteration machine (without external aid) is currently an idea of fiction. Brains can’t do it so I’m not particularly worried if machines can’t either.

A photograph is not the same as its subject, and it is not sufficient to reconstruct the subject, but it’s still a representation of the subject. Even a few sketched lines are something we recognise as a representation of a physical object.

I think it’s fair to call one process that can imitate a more complex one a representation of that process. Especially when in the very next sentence he describes it as a “projection”, which has the mathematical sense of a representation that loses some dimensions.

Because they can learn a bunch of symbolic formal arithmetic without learning anything about quantity. They can learn

  5 x 3 = 15

without learning

  *****    ****     *******
  ***** =  *****  = *******
  *****    ******   *

And this generalizes to almost every sentence an LLM can regurgitate.

Which basic counting tests do they still fail? Recent examples I’ve seen fall well within the range of innumeracy that people routinely display. I feel like a lot of people are stuck in the mindset of 10 years ago, when transformers weren’t even invented yet and state-of-the-art models couldn’t identify a bird, no matter how much capabilities advance.

> Recent examples I’ve seen fall well within the range of innumeracy that people routinely display.

Here’s GPT-4 Turbo in April botching a test almost all preschoolers could solve easily: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr…

I have not used LLMs since 2023, when GPT-4 routinely failed almost every counting problem I could think of. I am sure the performance has improved since then, though “write an essay with 250 words” still seems unsolved.

The real problem is that LLM providers have to play a stupid game of whack-a-mole where an enormous number of trivial variations on a counting problem need to be specifically taught to the system. If the system was capable of true quantitative reasoning that wouldn’t be necessary for basic problems.

There is also a deception is that “chain of thought” prompting makes LLMs much better at counting. But that’s cheating: if the LLM had quantitative reasoning it wouldn’t need a human to indicate which problems were amenable to step-by-step thinking. (And this only works for O(n) counting problems, like “count the number of words in the sentence.” CoT prompting fails to solve O(nm) counting problems like “count the number of words in this sentence which contain the letter ‘e'” For this you need a more specific prompt, like “First, go step-by-step and select the words which contain ‘e.’ Then go step-by-step to count the selected words.” It is worth emphasizing over and over that rats are not nearly this stupid, they can combine tasks to solve complex problems without a human holding their hand.)

I don’t know what you mean by “10 years ago” other than a desire to make an ad hominem attack about me being “stuck.” My point is that these “capabilities” don’t include “understands what a number is in the same way that rats and toddlers understand what numbers are.” I suspect that level of AI is decades away.

Your test does not make any sense whatsoever because all GPT does when it creates an image currently is send a prompt to Dalle-3.

Beyond that LLMs don’t see words or letters (tokens are neither) so some counting issues are expected.

But it’s not very surprising you’ve been giving tests that make no sense.

> Recent examples I’ve seen fall well within the range of innumeracy that people routinely display.

But the company name specifically says “superintelligence”

The company isn’t named “as smart as the average redditor, Inc”

Yeah, it’s not clear what companies like OpenAI and Anthropic mean when they predict AGI coming out of scaled up LLMs, or even what they are really talking about when they say AGI or human-level intelligence. Do they believe that scale is all you need, or is it an unspoken assumption that they’re really talking about scale plus some set of TBD architectural/training changes?!

I get the impression that they really do believe scale is all you need, other than perhaps some post-training changes to encourage longer horizon reasoning.
Maybe Ilya is in this camp, although frankly it does seem a bit naive to discount all the architectural and operational shortcomings of pre-trained Transformers, or assume they can be mitigated by wrapping the base LLM in an agent that provides what’s missing.

> I honestly don’t see a market for “AI security”.

I suspect there’s a big corporate market for LLMs with very predictable behaviour in terms of what the LLM knows from its training data, vs what it knows from RAG or its context window.

If you’re making a chatbot for Hertz Car Hire, you want it to answer based on Hertz policy documents, even if the training data contained policy documents for Avis and Enterprise and Budget and Thrifty car hire.

Avoiding incorrect answers and hallucinations (when appropriate) is a type of AI safety.

Talent attracts capital. Ilya is a legendary visionary, with a proven track record of turning billions into hundreds of billions. Of course he can raise unlimited money.

There is so much talent in the world that didn’t join PayPal and get silicon valley investors and go on to make billions of dollars and found other companies.

The PayPal mafia includes Elon Musk, Peter Thiel, etc. They now parlayed that capital into more platforms and can easily arrange investments. Heck Peter Thiel even works with governments (Palantir) and got J D Vance on Trump’s ticket, while Elon might be in his admin.

Kolomoisky got Zelensky elected in Ukraine, by launching a show about an unlikely guy who wins the presidency and named the party after the show. They call them oligarchs over there but it’s same thing.

The first guy to 1 million followers on Twitter was Ashton Kutcher. He had already starred in sitcoms and movies for years.

This idea that you can just get huge audiences and investments due to raw talent, keeps a lot of people coming to Hollywood and Silicon Valley to “make it” and living on ramen. But even just coming there proves the point — a talented rando elsewhere in the world wouldn’t even have access to the capital and big boys networks.

They all even banked at the same bank! It’s all extremely centralized: https://community.intercoin.app/t/in-defense-of-decentralize…

I never understood this line of reasoning, because it presumes that everyone should have access to the same opportunities. It’s clearly silly once you throw a few counter examples: should a Private in the military be able to skip the ranks and be promoted straight to General? Should a new grad software dev be able to be promoted to lead engineer without getting any experience?

Clearly there are reasons why opportunities are gated.

> This idea that you can just get huge audiences and investments due to raw talent, keeps a lot of people coming to Hollywood and Silicon Valley to “make it” and living on ramen. But even just coming there proves the point — a talented rando elsewhere in the world wouldn’t even have access to the capital and big boys networks.

All those people start somewhere though. Excluding nepotism, which is tangential point, all those people started somewhere and then grew through execution and further opening of opportunity. But it’s not like they all got to where they are in one-shot. Taking your Ashton Kutcher example – yes he had a head start on twitter followers, but that’s because he executed for years before on his career. Why would it make sense for some rando to rack up a million followers before he did?

Talent will earn you opportunities, but it’s not going to open the highest door until you’ve put in the time and work.

Of course, it’s not to say inequity or unequal access to opportunities doesn’t exist in the world. Of course it does. But even in an ideal, perfectly equitable world, not everyone would have the same access to opportunities.

So yes, it makes perfect sense that someone would give Ilya $1B instead of some smart 18 year old, even if that 18 year old was Ilya from the past.

Presumably the private and the general are in the SAME organization and yes, the avenues for advancement are available equally to all, it’s based on merit and the rules are clear.

The analogy would be if the private could become a major overnight because they knew a guy.

Yes but a private cannot become a general without decades of experience.

What we see with ilya is not dissimilar. I don’t see why it’s bad that people are more hesitant to give a talented 18 year old $1B than the guy who’s been at the forefront of AI innovation.

Necessary but not sufficient

And sometimes not even necessary. Paris Hilton got a music distribution deal overnight cause of her dad’s capital!

Those people weren’t handed that success. You are acting as if they were born billionaires, which is far from true.

It’s not personally my goal to amass immense wealth and start giant companies (I would rather work minimally and live hedonically) but I am impressed by those that do so.

No, I’m saying it was those who went to silicon valley and got lucky to strike up relationships with CAPITAL who made it.

Overwhelmingly talent isnt sufficient. For most startups, the old boys network gets to choose who gets millions. And the next rounds a few people choose who will get billions.

Lots of dismissive comments here.

Ilya proved himself as a leader, scientist, and engineer over the past decade with OpenAI for creating break-through after break-through that no one else had.

He’s raised enough to compete at the level of Grok, Claude, et al.

He’s offering investors a pure play AGI investment, possibly one of the only organizations available to do so.

Who else would you give $1B to pursue that?

That’s how investors think. There are macro trends, ambitious possibilities on the through line, and the rare people who might actually deliver.

A $5B valuation is standard dilation, no crazy ZIRP style round here.

If you haven’t seen investing at this scale in person it’s hard to appreciate that capital allocation just happens with a certain number of zeros behind it & some people specialize in making the 9 zero decisions.

Yes, it’s predicated on his company being worth more than $500B at some point 10 years down the line.

If they build AGI, that is a very cheap valuation.

Think how ubiquitous Siri, Alexa, chatGPT are and how terrible/not useful/wrong they’ve been.

There’s not a significant amount of demand or distribution risk here. Building the infrastructure to use smarter AI is the tech world’s obsession globally.

If AGI works, in any capacity or at any level, it will have a lot of big customers.

All I’m saying is you used the word “if” a lot there.

AGI assumes exponential, preferably infinite and continuous improvement, something unseen before in business or nature.

Neither siri nor Alexa were sold as AGI and neither alone come close to a $1B product. gpt and other LLMs has quickly become a commodity, with AI companies racing to the bottom for inference costs.

I don’t really see the plan, product wise.

Moreover you say:
> Ilya proved himself as a leader, scientist, and engineer over the past decade with OpenAI for creating break-through after break-through that no one else had.

Which is absolutely true, but that doesn’t imply more breakthroughs are just around the corner, nor does the current technology suggest AGI is coming.

VCs are willing to take a $1B bet on exponential growth with a 500B upside.

Us regular folk see that and are dumbfounded because AI is obviously not going to improve exponentially forever (literally nothing in the observed universe does) and you can already see the logarithmic improvement curve.
That’s where the dismissive attitude comes from.

> literally nothing in the observed universe does

There are many things on earth that don’t exist anywhere else in the universe (as far as we know). Life is one of them. Just think how unfathomably complex human brains are compared to what’s out there in space.

Just because something doesn’t exist anywhere in the universe doesn’t mean that humans can’t create it (or humans can’t create a machine that creates something that doesn’t exist anywhere else) even if it might seem unimaginably complex.

That’s a very dismissive and unrealistic statement. There are plenty of investors investing in things such as AI and crypto out of FOMO who either see something that isn’t there or are just pretending to see something in the hope of getting rich.

Obviously, there are plenty of investors who don’t fall into this situation. But lets not pretend that just because someone has a lot of money or invests a lot of money that it means they know what they are doing.

You should read the entire comment.

They also have the warchest to afford a $1B gamble.

If the math worked out for me too, I’d probably invest even if I didn’t personally believe in it.

Also investors aren’t super geniuses, they’re just people.

I mean look at SoftBank and Adam Neuman… investors can get swept up in hype and swindled too.

> If AGI works, in any capacity or at any level, it will have a lot of big customers.

This is wrong. The models may end up cheaply available or even free. The business cost will be in hosting and integration.

Even with Ilya demonstrating his capabilities in those areas you mentioned, it seems like investors are simply betting on his track record, hoping he’ll replicate the success of OpenAI. This doesn’t appear to be an investment in solving a specific problem with a clear product-market fit, which is why the reception feels dismissive.

I repeatedly keep seeing praise for Ilyas achievements as a scientist and engineer, but until ChatGPT OpenAI was in the shadow of DeepMind, and to my knowledge (I might be wrong) he has not been that much involved with ChatGPT?

the whole LLM race seems deaccelerate, and all the hard problems about LLMs seems not do have had that much progress the last couple of years (?)

In my naaive view I think a guy like David Silver the creator/co-lead of Alpha-Zero deserves more praise, atleast as a leader/scientist.
He even have lectures about Deep RL after doing AlphaGo: https://www.davidsilver.uk/teaching/

He has no LinkedIn and came straight from the game-dev industry before learning about RL.

I would put my money on him.

I’m not optimistic about AGI, but it’s important to give credit where credit is due.

Even assuming the public breakthroughs are the only ones that happened, the fact that openai was able to make an llm pipeline from data to training to production at their scale before anyone else is a feat of research and engineering (and loads of cash)

I’m also confused by the negativity on here. Ilya had a direct role in creating the algorithms and systems that created modern LLMs. He pioneered the first deep learning computer vision models.

Again I read the comments and can’t think of any place less optimistic or understanding of technology than Hackernews. Lot’s of armchair critics thinking they know better than the guy who help built AlexNet. I should be surprised but I’m not anymore, just disapointed.

One of the smartest computer science researchers is taking a stab at the most important problem of our lifetimes, we should be cheering him on.

I think this is actually a signal that the AI hype is dissipating.

These numbers and the valuation are indicative that people consider this a potentially valuable tool, but not world changing and disruptive.

I think this is a pretty reasonable take.

At the height of the Japanese economy in the 80s the about 2 square miles of land on which the Imperial Palace stood were worth more than all property in California. Clearly a brilliant moment to get into Japanese real estate!

A valuation at seed mentioned to possibly be in the region of $5bn means that these investors expect there’s a reasonable chance that this company, which at this point will be one among many, might become one of the largest companies in the world as that’s the kind of multiple they’d need given the risks of such an early stage bet.

That doesn’t sound like the hype is dissipating to me.

Explain why you think $1B at $5B valuation isn’t overvaluation? This strikes me as over-indexing on Ilya + teams ability to come up with something novel while trying to play catch-up.

At what point can we start agreeing that all these obscene investments and ridiculous valuations on something that’s little more than a powerpoint deck at this stage is nothing more than degenerate gambling by the ultra rich?

We don’t even understand how the brain functions completely, not even close. Until we have a complete understanding of how our own GI works down to the exact bio-mechanical level, we can’t achieve AGI.

That’s the theoretical basis and path for achieving AGI (if it’s even possible). I’m tired of all the “we stick enough data in the magic black box blender and ta-da! AGI!”

Every giant technological break-through throughout history has had a massive underpinning of understanding before ever achieving it. And yet, with the AI bubble somehow we’re just about to secretly achieve it, but we can’t tell you how.

I’m drawing a blank on the paper and can’t find it casually Googling, but there are fairly well understood mathematical models for how neurotransmitters cause neurons to fire or not fire. It is just probabilities when you zoom out enough. One paper modeled part of a rat brain, visual cortex I think, using this by basically coding up some simulated neurons and neurotransmitters, then turned it on. They were able to get the program and the live rat brain to display similar patterns when showing them various images.

I feel like this could be a path to GI without “truly” understanding the human brain: make a large enough simulation of the brain and turn it on. I actually do think we understand enough about the nuts and bolts of neuron interaction to achieve this. What we don’t understand is where neurons firing turns into consciousness. It seems like it is probably just an emergent property of a complex enough neuron graph.

> Until we have a complete understanding of how our own GI works down to the exact bio-mechanical level, we can’t achieve AGI.

This doesn’t make any sense.