LLMs using code to answer questions is nothing new, it's why the "how many Rs in...

adius · 2026-02-24T06:49:40 1771915780

I also think that sandboxing is crucial. That’s why I’m working on a Wolfram Language interpreter that can be run fully sandboxed via WebAssembly: https://github.com/ad-si/Woxi

danpalmer · 2026-02-24T10:10:51 1771927851

Awesome. I'm pretty unfamiliar with the Wolfram Language, but my understanding that the power of it came from the fact it was very batteries-included in terms of standard library and even data connections (like historical weather or stock market data).

What exactly does Woxi implement? Is it an open source implementation of the core language? Do you have to bring your own standard library or can you use the proprietary one? How do data connections fit into the sandboxing?

I realise I may be uninformed enough here that some of these might not make sense though, interested to learn.

adius · 2026-02-24T10:29:59 1771928999

Yes, we agree that a lot of the value comes from the huge standard library. That's why we try to implement as much of it as possible. Right now we support more than 900 functions. All the Data functions will be a little more complicated of course, but they could e.g make a request to online data archives (ourworldindata.org, wikidata.org, …). So I think it's definitely doable.

We also want to provide an option for users to add their own functions to the standard library. So if they e.g. need `FinancialData[]` they could implement it themselves and provide it as a standard library function.

Someone · 2026-02-24T13:17:38 1771939058

> it's why the "how many Rs in strawberry" question doesn't trip them up anymore, because they can write a few lines of Python to answer it, run that, and return the answer.

That still requires the LLM to ‘decide’ that consulting Python to answer that question is a good idea, and for it to generate the correct code to answer it.

Questions similar to ”how many Rs in strawberry" nowadays likely are in their training set, so they are unlikely to make mistakes there, but it may be still be problematic for other questions.

simianwords · 2026-02-24T08:26:06 1771921566

>LLMs using code to answer questions is nothing new, it's why the "how many Rs in strawberry" question doesn't trip them up anymore, because they can write a few lines of Python to answer it, run that, and return the answer.

False. It has nothing to do with tool use but just reasoning.

FrustratedMonky · 2026-02-24T15:10:30 1771945830

What is reasoning?

I also can not multiply large numbers without a paper and pencil, and following an algorithm learned in school.

That is the same as an LLM running some python, is the same as me following instructions to perform multiplication.

danpalmer · 2026-02-24T10:03:53 1771927433

It's so easy to google this and find that they all do exactly this.

Gemini: https://ai.google.dev/gemini-api/docs/code-execution

ChatGPT: https://help.openai.com/en/articles/8437071-data-analysis-wi...

Claude: https://claude.com/blog/analysis-tool

Reasoning only gets you so far, even humans write code or use spreadsheets, calculators, etc, to get their answers to problems.

simianwords · 2026-02-24T11:09:21 1771931361

you have just linked the fact that they have code executions but not proved that it is needed for strawberry problem.

there are multiple ways to disprove this

1. GPT o1 was released and it never supported the tools and it easily solved the strawberry problem - it was named strawberry internally

2. you can run GPT 5.2-thinking in the API right now and deny access to any tools, it will still work

3. you can run deepseek locally without tools and run it, it will still work

Overall this idea that LLM's cant reason and need tools to do that is misleading and false and easily disproven.

danpalmer · 2026-02-24T11:41:11 1771933271

Oh right you're very focused on specifically the strawberry problem. I just gave that as a throwaway example. It's a solution but not necessarily the solution for something that simple.

My point was much more general, that code execution is a key part of these models ability to perform maths, analysis, and provide precise answers. It's not the only way, but a key way that's very efficient compared to more inference for CoT.

simianwords · 2026-02-24T12:14:02 1771935242

I agree that tool usage dramatically improves the utility of LLM's. But it is absolutely not needed for the strawberry problem.

It can perform complicated arithmatic without tools - multiplying multiple 20 digit numbers, division and so on (to an extent).