mlx-lm

ai 2025-09-17

mlx-lm

2025-09-17

文变染乎世情，兴废系乎时序。一一刘勰

MLX LM：用 Apple Silicon 极速运行与微调大模型的全能工具箱

想要在 Mac 上玩转大语言模型（LLM）？想要轻松微调自定义模型，甚至一键上传 Hugging Face？或者，你想让 LLM 生成和推理速度飞快、效率极高？今天介绍的 MLX LM，就是为 Apple Silicon 用户量身打造的 LLM 工具包！

项目简介

仓库地址：ml-explore/mlx-lm
一句话描述：Run LLMs with MLX —— 用 MLX 在 Mac 上高效运行和微调大语言模型。
语言：Python
许可协议：MIT
Star：2313+
Fork：249+
标签：llms mlx
主力平台：Apple Silicon（支持 macOS 15.0 及更高版本）

MLX LM 能做什么？

1. 轻松加载和运行上千 Hugging Face LLM

集成 Hugging Face Hub，支持“一行命令”加载和运行千款模型，包括 Llama、Mistral、Mixtral、Phi-2、Qwen、Plamo 等热门 LLM。

2. 极速推理与分布式微调

支持 MLX 框架下的高效文本生成与微调。
内建量化（如 4bit）、全模型/低秩微调（LoRA/QLoRA）。
支持分布式推理和训练，玩转多卡！

3. 一键上传 Hugging Face

量化和微调后的模型，可以一键上传到 Hugging Face，轻松管理和分享你的自定义 LLM。

4. 高级缓存与大模型优化

支持 Key-Value 缓存、prompt 缓存，大幅提升长文本推理速度。
针对大模型优化内存管理，支持 macOS 15 新特性，充分利用 Apple Silicon 性能。

安装方法

# pip 安装
pip install mlx-lm

# 或者 conda
conda install -c conda-forge mlx-lm

快速体验

命令行一键生成

1	`mlx_lm.generate --prompt "How tall is Mt Everest?"`

默认使用 mlx-community/Llama-3.2-3B-Instruct-4bit。
可用 --model 指定任意 MLX 兼容模型。

聊天机器人 REPL

1	`mlx_lm.chat`

进入交互模式，与 LLM 聊天，支持多轮上下文！

查看参数帮助

1	`mlx_lm.generate -h`

Python API 极速开发

MLX LM 也可以作为 Python 包引入，方便自定义开发。

1. 基本用法

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Mistral-7B-Instruct-v0.3-4bit")
prompt = "Write a story about Einstein"

messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

text = generate(model, tokenizer, prompt=prompt, verbose=True)
print(text)

2. 支持批量推理与流式输出

批量推理示例
流式生成：

from mlx_lm import load, stream_generate

model, tokenizer = load("mlx-community/Mistral-7B-Instruct-v0.3-4bit")
prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

for response in stream_generate(model, tokenizer, prompt, max_tokens=512):
    print(response.text, end="", flush=True)
print()

模型量化与上传

MLX LM 支持模型量化（如 4bit），并能一键上传到 Hugging Face：

from mlx_lm import convert

repo = "mistralai/Mistral-7B-Instruct-v0.3"
upload_repo = "mlx-community/My-Mistral-7B-Instruct-4bit"

convert(repo, quantize=True, upload_repo=upload_repo)

对应命令行用法：

1	`mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.3 -q --upload-repo mlx-community/my-4bit-mistral`

高级功能速览

1. Prompt 缓存

大幅加速长上下文多轮推理：

cat prompt.txt | mlx_lm.cache_prompt \
  --model mistralai/Mistral-7B-Instruct-v0.3 \
  --prompt - \
  --prompt-cache-file mistral_prompt.safetensors

mlx_lm.generate \
  --prompt-cache-file mistral_prompt.safetensors \
  --prompt "\nSummarize the above text."

2. 支持多种模型

直接支持 Hugging Face 上主流 LLM，包括：

mistralai/Mistral-7B-v0.1
meta-llama/Llama-2-7b-hf
deepseek-ai/deepseek-coder-6.7b-instruct
01-ai/Yi-6B-Chat
microsoft/phi-2
Qwen/Qwen-7B
以及 Mixtral、Plamo、Falcon 等

部分模型需 --trust-remote-code 或 --eos-token 参数，详见官方文档和 README。

macOS 下大模型优化建议

建议 macOS 15.0 及以上使用，可利用 iogpu.wired_limit_mb 设置提升速度。
大模型需合理分配内存，详见 README 相关小节。

社区与生态

Star：2313
Fork：249
Open Issues：63
讨论区与文档活跃，适合反馈、提问与交流。
支持 PR 贡献，社区氛围友好。

总结

MLX LM 让 Mac 用户也能畅快体验大语言模型，从推理到微调，从量化到云端分享，全部一站式搞定。无论你是 AI 开发者、模型极客还是 LLM 爱好者，MLX LM 都能释放 Apple Silicon 的全部潜力，让你的 LLM 跑得更快、玩得更溜！

赶快试试吧！

仓库地址：https://github.com/ml-explore/mlx-lm