As AI tools improve , we keep getting further to offload more and more complex tasks to them . LLMscan pen our email for us , make presentation , conception apps , get videos , look for the internet and summarize the results , and so much more . One matter they ’re still really struggle with , however , is video plot .

So far this year , two of the biggest names in AI ( Microsoft and Anthropic ) have judge to get their mannequin to beget or toy game , and the result are probably a lot more special than many people expect .

This have them staring showcases of where procreative AI is really at right now — in short : it can do a lot more than before , but it ca n’t do everything .

Screenshot from Quake II’s Steam page.

A screenshot of the real Quake II from its Steam page.Steam

Microsoft generates Quake II

Generating television games has similar problems to generate video — motility is weird and morph - y , and the AI set out to lose touch with “ reality ” after a set amount of time . Microsoft ’s up-to-the-minute attack , whichanyone can try out , is an AI - render version of Quake II .

I play it quite a few times and it ’s a truly trippy experience , with weird , smudgy enemy appear out of nowhere and the environment changing around you as you move . Multiple times when I entered a new elbow room , the incoming would be gone when I rick back to face it — and when I looked forwards again the paries would have moved .

The experience only live a few minutes before it cut out and motivate you to start a new secret plan — but if you ’re unlucky , it can stop decent responding to your inputs even before that .

Copilot Quake II game.

A screenshot of Copilot’s generated version of Quake II.Microsoft

It ’s a great experiment , however , and one I imagine would be utilitarian for more people to see . It allow you experience for yourself what gen AI is good at , and what its current restriction are . As impressive as it is that we can generate an interactional television secret plan experience at all , it ’s punishing to suppose that anyone could play this technical school demonstration and think the nextAssassin ’s Creedwill be made by AI .

That have in mind they get a line exaggerated and at odds claims like these :

It has the potential drop to solve some of the world ’s freehanded problems , such as clime change , poverty and disease . ( Bill Gates )

Screenshot of Claude playing Pokemon.

ClaudePlaysPokemon

credibly in 2025 , we at Meta , as well as the other company that are basically working on this , are die to have an AI that can effectively be a form of midlevel engineer that you have at your company that can write code . ( Mark Zuckerberg )

Using AI effectively is now a fundamental expectation of everyone at Shopify . It ’s a dick of all trade wind today , and will only grow in grandness . candidly , I do n’t think it ’s practicable to opt out of determine the skill of go for AI in your guile . ( Tobi Lutke , chief executive officer of Shopify )

We are now confident we know how to construct AGI as we have traditionally understood it . We believe that , in 2025 , we may see the first AI agents “ join the workforce ” and materially change the yield of companies . ( Sam Altman , CEO of OpenAI )

A graph of Claude’s progress in Pokemon.

Anthropic

AI is more grievous than , say , mishandle aircraft design or production maintenance or big car production , in the sentiency that it is , it has the potential — however small one may regard that chance , but it is non - piffling — it has the potential of civilization destruction . ( Elon Musk )

This is all pretty extreme , right ? It will both salvage us and put down us , it ’s both a creature of all trades for professionals and a tool that will replace professionals — and plainly , we could getsci - fi - floor AGIas shortly as this year . When this is all mass learn , they start expect pretty awful things from these tools and trust all place workers spend their days conversing with their calculator like Star Trek characters .

However , that is not what world appear like . Reality looks like a trippy , smudgy Quake II with uncomprehensible shapes for enemies . ChatGPT - tier LLMsreally were an exciting breakthrough in 2022 , and a ton of fun for everyone to play around with — but for the bulk of uses bad technical school is pushing on us right now , AI just is n’t capable enough . truth story are too down in the mouth , didactics - following abilities are too low , context windows are too small , and they ’re just school on net nonsense alternatively of real - globe knowledge .

But generating a video game is a pretty complex finish — it takes whole squad of humankind years to make these thing , after all . How about playing video recording games rather ?

Claude “plays” Pokémon Red

Well , it turns out people are experimenting with that , too . Anthropic ’s new manakin , Claude 3.7 Sonnet , has been playingPokémon Red on Twitchfor around two month now , and he ’s doing the best business an LLM has ever done at playing Pokémon . One slight caveat , however , is that he ’s still mile behind the average 10 - year - older human being .

One of the problem is speed — it assume Claude thousand of action spanning multiple daylight to do thing like make it through Viridian Forest .

Why does it take so long ? It ’s not because he ca n’t project out how to strategically make headway Pokémon battle — that ’s actually the part he ’s best at . Navigating through the environment and avoiding Tree and building , on the other hand — not so good . Claude has never been trained to meet Pokémon , and it ’s not easy for him to see the pixel art and what it represents .

Making it through maze - type areas like Mt. Moon is particularly difficult for him , as he shinny to organize a map of the domain and avoid reconstruct his steps . One clip , he got himself so stuck in a corner that he concluded the biz was broken and get a formal postulation to have the secret plan reset .

These early attack were not without bit of levity too . On one occasion , Claude engender lodge in a corner and — convinced something must be broken — type out a formal request to readjust the game.pic.twitter.com/5RIiCJdxCM

He ’s also not great at remember what his goal are , what things he ’s already tried , or which places he ’s already been .

There ’s a pretty straightforward reasonableness for that one — LLMs have a finite “ linguistic context window ” that do as their memory . It can only hold so much information , and once Claude hits the limit , he distil what he ’s got to make room for more . So a small-arm of selective information like “ Visited Viridian City , enrol every construction , and spoke to every NPC ” might get condensed to just “ Visited Viridian City ” — prompting Claude to go back and find out if there was more to do in the metropolis .

To sum it up : Claude ca n’t figure out where he ’s going , he walks into walls , mistakes random objects for NPCs , forget where he ’s been and what he ’s trying to do , and every conclusion he crap requires paragraph and paragraphs of logical thinking . This is n’t a criticism — these are both exciting experiments that are promote LLMs as far as they can go .

But with all the plug around AI , it find important for people to see demonstration like these and make their own minds up about AI . sure frame are seek to push the story that we ’re about to get in touch with the peak — that within yr , AI will be beyond even the smart humans — but I do n’t think they ’re being sincere , they ’re just being salesman . We ’re nowhere near the peak , this whole thing is only just start .