Explore the capabilities and considerations for Small Language Models (SLMs). Whether you’re using out-of-the-box SLMs or customizing/fine-tuning them with your own data, we’ll cover practical considerations and best practices. Enhance your language pr
The release of Mistral-NeMo-Minitron 8B comes a day after Microsoftopen-sourcedthree language models of its own. Like Nvidia’s new algorithm, they were developed with hardware efficiency in mind. The most compact model in the lineup is called Phi-3.5-mini-instruct. It features 3.8 billion ...
Microsoft Research released a paper called “Textbooks is All You Need,” where they introducedphi-1, a new large language model for code. phi-1 is a transformer-based model with 1.3B parameters, which was trained for
Despite its multibillion-dollar investment in OpenAI and its coveted ChatGPT pre-generative large language models (LLM), Microsoft is also working internally to create alternatives that now include a cheaper “small language model” (SLM) solution. How much does ChatGPT cost? According to a repor...
Microsoft Research is experimenting with the development of tailored AI models that minimize resource usage. Credit: patpitchaya / Shutterstock 2023 was very much the year of the large language model. OpenAI’s GPT models, Meta’s Llama, Google’s PaLM, and Anthropic’s Claude 2 are all ...
Phi-3-mini.Part of the Phi-3 family from Microsoft, thePhi-3-minihas applications in language processing, reasoning, coding and math. Gemma 2.Part of Google's openGemmafamily of models, Gemma 2 is a 2 billion-parameter model developed from the samefoundationas theGoogle GeminiLLM. ...
A few months ago, we introducedOrca, a 13-billion parameter language model that demonstrated strong reasoning abilities by imitating the step-by-step reasoning traces of more capable LLMs. Orca 2 is the latest step in our efforts to explore the capabilities of smaller LMs (on the orde...
Access: https://huggingface.co/microsoft/phi-2 Open source: Yes, for research purposes only. 7. StableLM-zephyr StableLM-Zephyr is a small language model with 3 billion parameters that is great when you want accuracy and speed. This model provides a fast inference and performs incredibly well...
graded by a (human) teacher. This new paradigm overcomes the flaws of standard benchmarks which often requires the model’s output to be very structures, and moreover provides a multidimensional score for the model, providing scores for different capabilities su...
How to make a smaller language model work like a large one Microsoft Research found that smaller models can perform as well as larger ones if certain choices are made during training. One way Microsoft Research makes smaller language models perform as well as large ones is by using “textbook...