Why the Future of Generative AI is Local and Specialized
As artificial intelligence (AI) becomes increasingly embedded across nearly every facet of life – from agriculture, education, and health systems to everyday digital tools – there is growing interest in how to make these technologies more practical, affordable, and sustainable. This has contributed to a shift toward Frugal AI, an approach to designing and deploying AI systems that prioritizes doing more with less. This industry mindset champions resource efficiency, sustainability, and data sovereignty by enabling AI systems to run on modest, locally available hardware – offering a more predictable, lower-cost approach for resource-constrained communities and institutions.
One of the clearest examples of this shift is the growing interest in Small Language Models (SLMs). Back in 2022, when ChatGPT was first launched, very few people knew that you could run a chatbot – perhaps less capable, but still useful and far more efficient – directly on your laptop with no internet connection needed. Even today, in 2026, many people are still unaware of this possibility. The reality is, most of the core knowledge needed to build and deploy generative AI systems is openly available.
This is where SLMs come in. You can think of these as compact versions of the AI systems many people are now familiar with. Where a Large Language Model (LLM) such as GPT, Google Gemini, or Anthropic’s Claude can have hundreds of billions – or even trillions – of parameters, SLMs typically have billions or tens of billions at the most. Think of parameters as connections between neurons in a brain. More parameters generally mean more connections and, in theory, more information, but also less efficiency. And more information does not always translate to more useful knowledge.
How Do SLMs Deliver More With Less?
Although some SLMs are trained from scratch on high-quality specialized datasets, they can also be understood as more efficient and compact versions of LLMs, often optimized for very specific domains or tasks. Through different techniques collectively known as Model Compression, larger models can be pruned, quantized, and distilled into smaller equivalents. In simpler terms, this involves removing unnecessary data and simplifying the underlying mathematical processes to save space. While they may lose some of the broader use cases of LLMs, SLMs gain significantly in speed, power consumption efficiency, and the ability to run on more modest hardware.
LLMs are generally designed as “generalist” models, with vast inter-domain knowledge, while SLMs are often more specialized and fine-tuned on curated datasets to master specific industry contexts and nuances. Some popular SLMs, such as Google’s Gemma, IBM’s Granite, and Alibaba’s Qwen, are also open-weights, released under permissive open-source licenses, which helps lower the cost of adoption and experimentation.
These characteristics make SLMs especially attractive for organizations and governments looking for practical, lower-cost ways to adopt AI. Some of the key advantages include:
- Speed & Low Latency: Almost instant response times directly on client/edge devices, without internet round-trips
- Cost-Efficiency: A fraction of the compute and Application Programming Interface (API) costs of large, cloud-hosted LLMs
- Privacy & Compliance: Sensitive data never leaves the user’s device or is within an organization’s own infrastructure
- Offline / Edge Capabilities: Ability to run in regions with poor or no internet connectivity.
- Eco-Friendly: Drastically reduced power consumption and carbon footprint
Where Do SLMs Excel?
Because these models are much smaller, they can run on modest hardware and provide some AI services closer to where the end users are and where data is being produced – on edge devices or within organizational clouds. Institutions that benefit the most are those requiring stricter data privacy, more predictable and lower operational costs, and/or the ability to function in offline or low-connectivity environments, for example, when working on internal documents in a remote regional learning centre.
Tasks such as document summarization, retrieval-augmented generation (RAG), tool use for agentic AI workflows, customer support chatbots, quality assurance, or image understanding can be done using these models at a fraction of the cost of using Large Language Models, while also providing faster response times, and even more importantly, increased security and privacy. Because SLMs can be deployed within the organizational infrastructure or, even better, on edge devices, confidential information does not always have to travel through the less secure internet to an external LLM service provider and back, reducing dependence on third-party processors and some of the security risks that come with them.
The democratization of agents is of particular interest for the implementation of agent AI workflows and pipelines, since using multiple small language models with distinct capabilities can help limit the inherent biases that may emerge when a single LLM performs multiple roles. These models are also small enough to run locally on laptops or directly on mobile phones, with no internet access required. What began a few years ago as experimental use cases is likely to become much more common in the years to come, with the growing adoption of Neural Processing Units (NPUs) – specialized AI chips that are now standard in modern mobile devices, especially in regions where internet access is either slow or unreliable.
Training AI models on customer data is also much cheaper on SLMs than with LLMs because of the hardware and energy requirements associated with training large language models.
When do we still need LLMs?
So what are the drawbacks, and why would we even use LLMs? These small models do have limitations and use cases where they are less capable. The most obvious examples are tasks that require broad, multi-disciplinary expertise or vast amounts of knowledge. Use cases such as asking for general information about less common places, people, or events, or solving complex multidisciplinary problems – for example, a soil scientist with excellent data science & coding skills working on a new crop prediction model – are where LLMs shine.
While there are ongoing efforts to make LLMs more efficient and improve how they verify or fact-check information, the fact that they store vast amounts of human knowledge still makes them the best candidates for these kinds of tasks.
The Future is Hybrid and Sovereign
In complex agentic AI workflows, the most practical approach will likely be a hybrid one: using more expensive LLMs only when their broader expertise is required. Smaller language models can then be used for tasks such as tool use (web search or function calling), handling very sensitive data that should not leave the organizational cloud, supporting very specific cases such as RAG, or powering local specialized models fine-tuned on organizational or customer data.
Sovereign AI has also emerged as a critical strategic, economic, and cultural imperative, although implementation is often limited by financial and technological barriers. With its three major pillars – infrastructure, data, and model sovereignty – there is a growing opportunity to put efficiency at the centre of AI deployment. In many ways, this reflects the broader shift toward Frugal AI: designing AI systems that do more with less while remaining practical, sustainable, and locally adaptable. Hybrid AI approaches such as the ones described above can help lower costs and make this architecture accessible without breaking the bank, improve security through the use of local AI systems when possible, and reduce dependence on a single vendor through a more diversified and democratized AI system.