mllm is a lightweight, fast, and easy-to-use (multimodal) on-device LLM inference engine for mobile devices (mainly supporting CPU/NPU), initiated by the research groups led byMengwei Xu(BUPT) andXuanzhe Liu(PKU). Recent update [2024 November 21] Support new model: Phi 3 Vision#186 ...
paper:MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Link:https://arxiv.org/pdf/2402.14905 TL,DR: 适合mobile设备上用的LM模型架构的探索,并提出了MobileLLM。 端侧设备的特点 常见的端侧设备基本都是memory+算力 有限,因此需要训一些参数量比较少的语言模型。 预实验 ...
建立强大的基线-MobileLLM 层共享-MobileLLM-LS 实验效果 实验设置 实验结果 下游任务 总结 原文地址: MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Casesarxiv.org/abs/2402.14905 简介 本文专注于设计少于 10 亿个参数的高质量 LLM用于移动端设备。与强调数据和参数数量在决定模...
mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device la...
研究人员另外也训练了其他参数规模的模型,包括MobileLLM-600M/1B/1.5B。Meta研究小组将MobileLLM的相关资源公开于GitHub及Hugging Face上。最新研究使Meta也跻身设备端(on-device)AI模型的供应商之林。今年稍早苹果公布了OpenELM 270M/450M/1.1B/3B、Google则先后开源了Gemma 2B/7B、Gemm2-9B/27B。
Battery Life Concerns:Running resource-intensive tasks like NLP on mobile devices can drain battery life quickly. Optimizing SLMs for energy efficiency is crucial to ensure that offline usage remains practical without significantly impacting battery perfor...
OneLLM Pro nearly doubles the number of on-device AI tools, covering a wider range of use cases than ever before. This expanded toolkit builds upon the core features of OneLLM, including: I. Core Features: 1. Unlimited Private & Local LLMs: Access thousands of fully offline, top-tier...
Measure RAM consumption Add app shortcuts for tasks Integrate Android-Doc-QA for on-device RAG-based question answering from documents Check if llama.cpp can be compiled to use Vulkan for inference on Android devices (and use the mobile GPU) Check if multilingual GGUF models can be supportedAb...
Experimental MediaPipe LLM Inference API allows developers to run large language models ‘on-device’ across Android, iOS, and web platforms.
LLM as a system service on mobile devices arXiv 2024 [Paper] Locmoe: A low-overhead moe for large language model training arXiv 2024 [Paper] Edgemoe: Fast on-device inference of moe-based large language models arXiv 2023 [Paper] General Efficiency and Performance Improvements Any-Precis...