Tiny LLMs: The Underdogs Set to Democratize AI — and Shake Up Big Tech! (it’s about s1–32B Model in layman terms)

3 min readFeb 10, 2025

You might have heard of s1–32B model, but if you are living under the rocks, here is a quick context.

Researchers from Stanford University built a tiny model s1–32B with just $50. Yes, that’s not a typo, $50, really. And it’s on par with OpenAI’s o1 reasoning model.

The s1–32B reasoning model was developed on top of the Alibaba’s Qwen2.5–32b-Instruct model.

I’ll try to explain this in a very layman terms.

They made this model not to expand on the breadth of the knowledge, not to learn more, not to find more related content on the internet, rather made it go back in the same knowledge pool and reason with itself again and again to stretch its reasoning muscles, exactly back in school, the syllabus was finite, and thus we went back and revised for our exams.

This is what they called test-time scaling, you can ignore this jargon but try to just understand the concept.

Going back to the school exams analogy, the syllabus was finite — that’s budget for the LLM

And stretch the reasoning muscles on the finite syllabus — that’s budget forcing — basically this LLM was made to take more time to dive deep into the knowledge it already has, and thus come back with better reasoning.

So here is the secret — the researchers came up with a dataset of 1000 questions, with three parameters — quality, difficulty and diversity — and these three parameters combination performed much better when they tested against individual parameters (using controlled experiments).

There are multiple ways of “scale up” a model’s (LLMs) ability to reason. This is called scaling. Two important ones are:

Sequential — where the model works towards an answer step by step, each new step is influenced by and builds upon the previous steps.
Parallel — where the model works on producing multiple parallel reasonings and chooses the best one among all.

The researchers who created s1 model actually chose the Sequential scaling method as it referred to previous thoughts, going back again and again to reason with itself. (which is basically budget forcing)

They also found that just Sequential scaling had limitations as the effect plateaued after some time. Also the context window was another limitation. (which basically mean what’s the memory of an model is, for every message you send in ChatGPT for example, they can’t remember everything in one go)

So they combined Sequential and Parallel scaling to overcome these limitations and it led to better scaling — meaning it made the s1 model do better reasonings.

One thing I forgot to mention — this s1 model — it had an amazing 50% accuracy in AIME (American Invitational Mathematics Examination) — with just 1000 set of questions and $50 investment. Imagine what it can do with more investments and more questions covering quality, difficulty and diversity.

This is a great achievement as this opens door for tiny LLMs be on par with bigger models. You can’t deploy bigger models on every device due to the resources needed by them. But you definitely can deploy smaller models even on smaller PC locally. So these fine-tuned models has the power to democratize the AI industry and put AI in hands of everyone with added privacy. There are plethora of use cases for these tiny models — in education, health, legal industries and in remote areas with low connectivities — ofcourse with some trade-offs as this is oversimplification of the possibilities.

Tiny LLMs: The Underdogs Set to Democratize AI — and Shake Up Big Tech! (it’s about s1–32B Model in layman terms)

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ankur Shrivastava

No responses yet