f2CBVx

ppt.cc/fKlBax ppt.cc/fwlgFx ppt.cc/fVjECx ppt.cc/fEnHsx ppt.cc/fRZTnx ppt.cc/fSZ3cx ppt.cc/fLOuCx ppt.cc/fE9Nux ppt.cc/fL5Kyx ppt.cc/f71Yqx tecmint.com linuxcool.com linux.die.net linux.it.net.cn ostechnix.com unix.com ubuntugeek.com runoob.com man.linuxde.net ppt.cc/fwpCex ppt.cc/fxcLIx ppt.cc/foX6Ux linuxprobe.com linuxtechi.com howtoforge.com linuxstory.org systutorials.com ghacks.net linuxopsys.com ppt.cc/ffAGfx ppt.cc/fJbezx ppt.cc/fNIQDx ppt.cc/fCSllx ppt.cc/fybDVx ppt.cc/fIMQxx

Wednesday, 1 July 2026

搭建基于nextjs的静态博客程序zabs

首先fork此项目https://github.com/zhutmost/analog-blog-starter ，我fork后，得到的项目地址是https://github.com/brightmann/zabs

访问https://github.com/brightmann/zabs/tree/main/data/demo/posts/doc ，在此处新建源帖（即add file),我新建了源帖fh.mdx，内容为：

---
title: "战马"
datePublish: 2026-06-18 22:05:00
category: 🌈 Docs
tags:
- misc1
- misc2
- misc3
summary: "这是一篇文章"
---

此处写正文或html codes.

( 详见https://github.com/brightmann/zabs/blob/main/data/demo/posts/doc/fh.mdx?plain=1）

然后，访问vercel.com/new ,导入项目https://github.com/brightmann/zabs ，

Build Command栏填入bun run build

Install Command栏填入bun install

然后点击底部的deploy按钮，等待部署完成。部署完成后，我得到网址https://zabs.vercel.app/

项目地址：

https://github.com/zhutmost/analog-blog-starter

https://github.com/zhutmost/analog-blog-starter/issues/125

https://github.com/brightmann/zabs

演示博客：

https://zabs.vercel.app/

https://zabs.vercel.app/archive/1 支持分页

https://zabs.vercel.app/post/doc/fh 能显示视频

离线语音输入法: CapsWriter-Offline

离线语音输入法，离线识别，高准确率、低延迟，支持热词、LLM润色。按住CapsLock或鼠标侧键X2说话，松开自动上屏。

按住 CapsLock 说话，松开就上屏。就这么简单。

CapsWriter-Offline 是一个专为 Windows 打造的完全离线语音输入工具。

✨ 核心特性

语音输入：按住 CapsLock键 或 鼠标侧键X2 说话，松开即输入，超低延迟，默认去除末尾逗句号。支持对讲机模式和单击录音模式。
文件转录：音视频文件往客户端 exe 一丢，字幕 (.srt)、文本 (.txt)、时间戳 (.json) 统统都有。
数字 ITN：自动将「十五六个」转为「15~16个」，支持各种复杂数字格式。
热词替换：在 hot.txt 记下偏僻词，通过音素模糊匹配，相似度大于阈值则强制替换。
正则替换：在 hot-rule.txt 用正则或简单等号规则，精准强制替换。
LLM 角色：预置了润色、小助理等角色，当识别结果的开头匹配任一角色名字时，将交由该角色处理。
托盘菜单：右键托盘图标即可添加热词、复制结果、清除LLM记忆。
C/S 架构：服务端与客户端分离，虽然 Win7 老电脑跑不了服务端模型，但最少能用客户端输入。
日记归档：按日期保存你的每一句语音及其识别结果。
录音保存：所有语音均保存为本地音频文件，隐私安全，永不丢失。

CapsWriter-Offline 的精髓在于：完全离线（不受网络限制）、响应极快、高准确率 且 高度自定义。我追求的是一种「如臂使指」的流畅感，让它成为一个专属的一体化输入利器。无需安装，一个U盘就能带走，随插随用，保密电脑也能用。

以下为支持的模型：

引擎名	准确性	速度	格式	显卡加速
Paraformer	★★★☆☆	★★★★★	ONNX	❌
SenseVoice-Small	★★★☆☆	★★★★★	ONNX	✅
Fun-ASR-Nano	★★★★☆	★★★★☆	ONNX + GGUF	✅
Qwen3-ASR	★★★★★	★★★☆☆	ONNX + GGUF	✅

性能参考（20s 音频转录延迟）：

模型	CPU U9-285H	GPU RTX5050
Paraformer	0.6s	-
SenseVoice-Small	0.6s	0.15s
Fun-ASR-Nano	2.0s	0.5s
Qwen3-ASR-1.7B	4.0s	1.0s

详细功能说明请参考 docs/ 目录：

环境依赖安装说明 — VC++ 运行库、FFmpeg 安装
热词功能如何使用 — 热词替换、规则替换、自定义短语
角色功能如何使用 — LLM 角色配置、输出模式、创建新角色
识别语言如何配置 — 各引擎语言支持范围与配置方法
文件转录功能如何使用 — 拖拽转字幕、时间戳对齐
显卡加速的若干问题 — DirectML、Vulkan 加速配置
模型下载的若干问题 — 引擎选择、模型下载、目录结构
常见问题 — FAQ
更新日志

💻 平台支持

目前仅能保证在 Windows 10/11 (64位) 下完美运行。

Linux：暂无环境进行测试和打包，无法保证兼容性。
MacOS：由于底层的 keyboard 库已放弃支持 MacOS，且系统限制极多，暂时无法支持。

LazyTyper 和闪电说也是很优秀的作品，都有离线引擎，都支持 Windows Linux 与 MacOS，并都有漂亮的图形化页面，推荐使用。

CapsWriter 的特别之处在于追求：

无感输入
完全离线，不受网络约束
低延迟，尽量做到硬件极限的最快速度
高度自定义的热词系统

🎬 快速开始

准备环境：确保安装了 VC++ 运行库。若要使用文件转录功能，还需安装 ffmpeg 并确保其在系统 PATH 中。
下载解压：下载 Latest Release 里的软件本体，再到 Models Release 下载模型压缩包，将模型解压，放入 models 文件夹中对应模型的文件夹里。
启动服务：双击 start_server.exe，它会自动最小化到托盘菜单。
启动听写：双击 start_client.exe，它会自动最小化到托盘菜单。
开始录音：按住 CapsLock键 或 鼠标侧键X2 就可以说话了！

⚙️ 个性化配置

所有的设置都在根目录的 config_server.py 和 config_client.py 里，可直接编辑。

🛠️ 常见问题

Q: 为什么按了没反应？
A: 请确认 start_client.exe 的黑窗口还在运行。若想在管理员权限运行的程序中输入，也需以管理员权限运行客户端。

Q: 为什么识别结果没字？
A: 到 年/月/assets 文件夹中检查录音文件，看是不是没有录到音；听听录音效果，是不是麦克风太差，建议使用桌面 USB 麦克风；检查麦克风权限。

Q: 想要隐藏黑窗口？
A: 点击托盘菜单即可隐藏黑窗口。

Q: 如何开机启动？
A: Win+R 输入 shell:startup 打开启动文件夹，将服务端、客户端的快捷方式放进去即可。

更多问题请参阅 docs/常见问题.md。

🚀 我的其他优质项目推荐

项目名称	说明	体验地址
IME_Indicator	Windows 输入法中英状态指示器	下载即用
Rust-Tray	将控制台最小化到托盘图标的工具	下载即用
Gallery-Viewer	网页端图库查看器，纯 HTML 实现	点击即用
全景图片查看器	单个网页实现全景照片、视频查看	点击即用
图标生成器	使用 Font-Awesome 生成网站 Icon	点击即用
五笔编码反查	86 五笔编码在线反查	点击即用
快捷键映射图	可视化、交互式的快捷键映射图 (中文版)	点击即用

❤️ 致谢

本项目基于以下优秀的开源项目：

感谢 Google Antigravity、Anthropic Claude、GLM、DeepSeek，如果不是这些编程助手，许多功能（例如基于音素的热词检索算法）我是无力实现的。

from https://github.com/HaujetZhao/CapsWriter-Offline

smolvm

a Tool to build & run portable, lightweight, self-contained virtual machines.

smolmachines.com

Ship and run software with isolation by default.

This is a CLI tool that lets you:

Manage and run custom Linux virtual machines locally with: sub-second cold start, cross-platform (macOS, Linux, Windows), elastic memory usage.
Pack a stateful virtual machine into a single file (.smolmachine) to rehydrate on any supported platform.

Install

# install (macOS + Linux)
curl -sSL https://smolmachines.com/install.sh | bash

# for coding agents — install + discover all commands
curl -sSL https://smolmachines.com/install.sh | bash && smolvm --help

Or download from GitHub Releases, and place it into ~/.local/share/.

Windows: download the windows-x86_64 release (bundles krun.dll + libkrunfw.dll), unzip it, and run smolvm.exe. Requires the Windows Hypervisor Platform (WHP) feature enabled.

Quick Start

# run a command in an ephemeral VM (cleaned up after exit)
smolvm machine run --net --image alpine -- sh -c "echo 'Hello world from a microVM' && uname -a"

# interactive shell
smolvm machine run --net -it --image alpine -- /bin/sh
# inside the VM: apk add sl && sl && exit

Use This For

Sandbox untrusted code — run untrusted programs in a hardware-isolated VM. Host filesystem, network, and credentials are separated by a hypervisor boundary.

# network is off by default — untrusted code can't phone home
smolvm machine run --image alpine -- nslookup example.com
# fails — no network access

# lock down egress — only allow specific hosts
smolvm machine run --net --image alpine --allow-host registry.npmjs.org -- wget -q -O /dev/null https://registry.npmjs.org
# works — allowed host

smolvm machine run --net --image alpine --allow-host registry.npmjs.org -- wget -q -O /dev/null https://google.com
# fails — not in allow list

Pack into portable executables — turn any workload into a self-contained binary. All dependencies are pre-baked — no install step, no runtime downloads, boots in <200ms.

smolvm pack create --image python:3.12-alpine -o ./python312
./python312 run -- python3 --version
# Python 3.12.x — isolated, no pyenv/venv/conda needed

Use local container images — for CI, air-gapped hosts, and fast iteration. Feed --image a docker save / podman save archive, pipe one on stdin, or point it at an unpacked rootfs directory. Image work is delegated to your container tooling; smolvm just boots the result.

# build locally, run in the VM with no push/pull
docker build -t myapp .
docker save myapp | smolvm machine run --image - -- ./app

# from an archive file (boots with no network)
smolvm machine run --image ./myapp.tar -- ./app

# from an already-unpacked rootfs directory
smolvm machine run --image ./rootfs/ -- ./app

Persistent machines for development — create, stop, start. Installed packages survive restarts.

smolvm machine create --net --name myvm
smolvm machine start --name myvm
smolvm machine exec --name myvm -- apk add sl
smolvm machine exec --name myvm -it -- /bin/sh
# inside: sl, ls, uname -a — type 'exit' to leave
smolvm machine stop --name myvm

Use git and SSH without exposing keys — forward your host SSH agent into the VM. Private keys never enter the guest — the hypervisor enforces this. Requires an SSH agent running on your host (ssh-add -l to check).

smolvm machine run --ssh-agent --net --image alpine -- sh -c "apk add -q openssh-client && ssh-add -l"
# lists your host keys, but they can't be extracted from inside the VM

smolvm machine exec --name myvm -- git clone git@github.com:org/private-repo.git

Declare environments with a Smolfile — reproducible VM config in a simple TOML file.

image = "python:3.12-alpine"
net = true

[network]
allow_hosts = ["api.stripe.com", "db.example.com"]

[dev]
init = ["pip install -r requirements.txt"]
volumes = ["./src:/app"]

[auth]
ssh_agent = true

smolvm machine create --name myvm -s Smolfile
smolvm machine start --name myvm

More examples: python · node · doom

How It Works

Each workload gets real hardware isolation — its own kernel on Hypervisor.framework (macOS), KVM (Linux), or the Windows Hypervisor Platform (Windows). libkrun VMM with custom kernel: libkrunfw. Pack it into a .smolmachine and it runs anywhere the host architecture matches, with zero dependencies.

Images use the OCI format — the same open standard Docker uses. Any image on Docker Hub, ghcr.io, or other OCI registries can be pulled and booted as a microVM. No Docker daemon required.

Defaults: 4 vCPUs, 8 GiB RAM. Memory is elastic via virtio balloon — the host only commits what the guest actually uses and reclaims the rest automatically. vCPU threads sleep in the hypervisor when idle, so over-provisioning has near-zero cost. Override with --cpus and --mem.

Comparison

	smolvm	Containers	Colima	QEMU	Firecracker	Kata
Isolation	VM per workload	Namespace (shared kernel)	Namespace (1 VM)	Separate VM	Separate VM	VM per container
Boot time	<200ms	~100ms	~seconds	~15-30s	<125ms	~500ms
Architecture	Library (libkrun)	Daemon	Daemon (in VM)	Process	Process	Runtime stack
Per-workload VMs	Yes	No	No (shared)	Yes	Yes	Yes
macOS native	Yes	Via Docker VM	Yes (krunkit)	Yes	No	No
Embeddable SDK	Yes	No	No	No	No	No
Portable artifacts	`.smolmachine`	Images (need daemon)	No	No	No	No

Platform Support

Host	Guest	Requirements
macOS Apple Silicon	arm64 Linux	macOS 11+
macOS Intel	x86_64 Linux	macOS 11+ (untested)
Linux x86_64	x86_64 Linux	KVM (`/dev/kvm`)
Linux aarch64	aarch64 Linux	KVM (`/dev/kvm`)
Windows x86_64	x86_64 Linux	Windows Hypervisor Platform (WHP) enabled

Known Limitations

Network is opt-in (--net on machine create). TCP/UDP only, no ICMP.
Volume mounts: directories only (no single files). Mounting at /workspace (-v /host/dir:/workspace) takes priority over the default storage-disk workspace — your host directory is used instead.
macOS: binary must be signed with Hypervisor.framework entitlements.
--ssh-agent requires an SSH agent running on the host (SSH_AUTH_SOCK must be set).
GPU acceleration requires libkrun built with GPU=1 and virglrenderer + a Vulkan driver on the host (see GPU Acceleration below).
Windows: --net works the same as on other platforms (virtio-net with inbound port-forwarding; TSI for outbound-only VMs), as do machine exec / interactive sessions and machine stats. Not yet available on Windows: GPU acceleration and machine fork / snapshot. Pack create needs storage-template.ext4 / overlay-template.ext4 next to smolvm.exe (Windows has no host mkfs.ext4).

GPU Acceleration

smolvm exposes the host GPU to guests via virtio-gpu / Venus (Vulkan-over-virtio). Guest workloads see a real Vulkan device; on Linux + Intel this renders as:

ANGLE (Intel, Vulkan 1.4 (Virtio-GPU Venus (Intel(R) UHD Graphics ...)), venus)

Host requirements

macOS — virglrenderer and MoltenVK are bundled in the smolvm distribution. No extra installs needed.

Linux — virglrenderer and a host Vulkan driver must be installed from the system package manager:

Distro	Packages
Alpine	`apk add virglrenderer mesa-vulkan-intel` (or `mesa-vulkan-ati` for AMD)
Debian/Ubuntu	`apt install virglrenderer0 mesa-vulkan-drivers`

virglrenderer depends on libEGL and libdrm from the host GPU driver stack — these are hardware-specific and cannot be bundled. Any GPU-capable Linux host will already have them installed via its GPU driver.

Usage

# CLI
smolvm machine run --gpu --image alpine -- vulkaninfo --summary

# Smolfile
# gpu = true
# gpu_vram = 2048   # MiB, default 4096

The guest Vulkan loader must be pointed at the virtio ICD:

export VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/virtio_icd.x86_64.json

Headless browser example

See examples/headless-browser/ for a working Chromium setup using ANGLE + Venus for hardware-accelerated WebGL inside a headless VM.

Development

See docs/DEVELOPMENT.md.

Apache-2.0 · made by @binsquare · twitter · github

from https://github.com/smol-machines/smolvm

llmfit

A Command to Identify Locally Runable Models.This is a terminal tool written in Rust that can automatically detect hardware information like CPU, GPU, and memory on the local machine, and recommend large models suitable for local operation. It ranks models by scoring them across dimensions such as quality, speed, adaptability, and context, and supports mainstream local inference environments including Ollama, llama.cpp, MLX, vLLM, and LM Studio.

--------------------------------------------

Hundreds of llm & providers. One command to find what runs on your hardware.

New: Community Leaderboard — Browse real-world performance data from actual users. Press b to see measured tok/s, TTFT, and VRAM for any GPU — not just yours. Pick from 27+ hardware presets (RTX 5090 to Apple M1) with H to compare real numbers before you buy or build.

Hundreds of models & providers. One command to find what runs on your hardware.

A terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU. Detects your hardware, scores each model across quality, speed, fit, and context dimensions, and tells you which ones will actually run well on your machine.

Ships with an interactive TUI (default) and a classic CLI mode. Supports multi-GPU setups, MoE architectures, dynamic quantization selection, speed estimation, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).

New: Community Leaderboard (b) — See real-world tok/s, TTFT, and VRAM usage from other users running the same hardware as you. Powered by localmaxxing.com, this bridges the gap between estimated and actual performance.

Also: Download Manager (D), Advanced Configuration (A), and Hardware Simulation — Press D to manage downloads, view history, delete models, and configure the download directory. Press A to tune TPS efficiency, run mode factors, and scoring weights. Press S to simulate different hardware.

Sister projects:

sympozium — managing agents in Kubernetes.
llmserve — a simple TUI for serving local LLM models. Pick a model, pick a backend, serve it.
llama-panel — a native macOS app for managing local llama-server instances.

Install

Windows

scoop install llmfit

If Scoop is not installed, follow the Scoop installation guide.

macOS / Linux

Homebrew

Prebuilt binary (recommended, works on all macOS/Linux versions):

brew install AlexsJones/llmfit/llmfit

Or from the homebrew-core formula, which builds from source on macOS versions without a bottle:

brew install llmfit

MacPorts

port install llmfit

Quick install

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

Downloads the latest release binary from GitHub and installs it to /usr/local/bin (or ~/.local/bin if no sudo).

Install to ~/.local/bin without sudo:

curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

uv / pip

To install or update llmfit:

uv tool install -U llmfit

To run without installing:

uvx llmfit

You can also install llmfit as a Python package in the normal way with tools such as pip or uv.

Docker / Podman

docker run ghcr.io/alexsjones/llmfit

This prints JSON from llmfit recommend command. The JSON could be further queried with jq.

podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

From source

git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# binary is at target/release/llmfit

Usage

TUI (default)

llmfit

Launches the interactive terminal UI. Your system specs (CPU, RAM, GPU name, VRAM, backend) are shown at the top. Models are listed in a scrollable table sorted by composite score. Each row shows the model's score, estimated tok/s, best quantization for your hardware, run mode, memory usage, and use-case category.

Key	Action
`Up` / `Down` or `j` / `k`	Navigate models
`/`	Enter search mode (partial match on name, provider, params, use case)
`Esc` or `Enter`	Exit search mode
`Ctrl-U`	Clear search
`f`	Cycle fit filter: All, Runnable, Perfect, Good, Marginal
`a`	Cycle availability filter: All, GGUF Avail, Installed
`s`	Cycle sort column: Score, Params, Mem%, Ctx, Date, Use Case
`v`	Enter Visual mode (select multiple models)
`V`	Enter Select mode (column-based filtering)
`t`	Cycle color theme (saved automatically)
`p`	Open Plan mode for selected model (hardware planning)
`P`	Open provider filter popup (type to fuzzy-filter providers)
`U`	Open use-case filter popup
`C`	Open capability filter popup
`L`	Open license filter popup
`R`	Open runtime/backend filter popup (llama.cpp, MLX, vLLM)
`S`	Open hardware simulation popup (override RAM/VRAM/CPU)
`A`	Open advanced configuration popup (tune efficiency, run mode factors)
`b`	Open community leaderboard view (localmaxxing.com)
`I`	Open inference bench view (local quality scoring against your models)
`h`	Open help popup (all key bindings)
`m`	Mark selected model for compare
`c`	Open compare view (marked vs selected)
`x`	Clear compare mark
`i`	Toggle installed-first sorting (any detected runtime provider)
`d`	Download selected model (provider picker when multiple are available)
`D`	Open Download Manager (history, deletion, config)
`r`	Refresh installed models from runtime providers
`Enter`	Toggle detail view for selected model
`PgUp` / `PgDn`	Scroll by 10
`g` / `G`	Jump to top / bottom
`q`	Quit

Vim-like modes

The TUI uses Vim-inspired modes shown in the bottom-left status bar. The current mode determines which keys are active.

Normal mode

The default mode. Navigate, search, filter, and open views. All keys in the table above apply here.

Visual mode (`v`)

Select a contiguous range of models for bulk comparison. Press v to anchor at the current row, then navigate with j/k or arrow keys to extend the selection. Selected rows are highlighted.

Key	Action
`j` / `k` or arrows	Extend selection up/down
`c`	Compare all selected models (opens multi-compare view)
`m`	Mark current model for two-model compare
`Esc` or `v`	Exit Visual mode

The multi-compare view displays a table where rows are attributes (Score, tok/s, Fit, Mem%, Params, Mode, Context, Quant, etc.) and columns are models. Best values are highlighted. Use h/l or arrow keys to scroll horizontally if more models are selected than fit on screen.

Select mode (`V`)

Column-based actions. Press V (shift-v) to enter Select mode, then use h/l or arrow keys to move between column headers. The active column is visually highlighted. Press Enter or Space to trigger that column's current action.

Column	Filter action
Inst	Cycle availability filter
Model	Enter search mode
Provider	Open provider popup
Params	Open parameter-size bucket popup (<3B, 3-7B, 7-14B, 14-30B, 30-70B, 70B+)
Score, tok/s, Mem%, Ctx, Date	Sort by that column
Quant	Open quantization popup
Mode	Open run-mode popup (GPU, MoE, CPU+GPU, CPU)
Fit	Cycle fit filter
Use Case	Open use-case popup

Row navigation still works in Select mode so you can see the effect of actions as you apply them: j/k, arrow keys, Ctrl-U, Ctrl-D, PageUp, PageDown, Home, and End. Press Esc to return to Normal mode.

TUI Plan mode (`p`)

Plan mode inverts normal fit analysis: instead of asking "what fits my hardware?", it estimates "what hardware is needed for this model config?".

Use p on a selected row, then:

Key	Action
`Tab` / `j` / `k`	Move between editable fields (Context, Quant, Target TPS)
`Left` / `Right`	Move cursor in current field
Type	Edit current field
`Backspace` / `Delete`	Remove characters
`Ctrl-U`	Clear current field
`Esc` or `q`	Exit Plan mode

Plan mode shows estimates for:

minimum and recommended VRAM/RAM/CPU cores
feasible run paths (GPU, CPU offload, CPU-only)
upgrade deltas to reach better fit targets

Hardware Simulation (`S`)

Press S to open the hardware simulation popup. Override RAM, VRAM, and CPU core count to see which models would fit on different target hardware. All model scores, fit levels, and speed estimates are recalculated instantly against the simulated specs.

Key	Action
`Tab` / `j` / `k`	Switch between RAM, VRAM, CPU fields
Type digits	Edit the selected field
`Enter`	Apply simulation
`Ctrl-R`	Reset to real detected hardware
`Esc`	Cancel and close

When simulation is active, a SIM badge appears in the system bar and status bar. The entire model table reflects the simulated hardware until you reset.

Advanced Configuration (`A`)

Press A to open the Advanced Configuration popup. This panel lets you tune the parameters behind TPS estimation, run mode penalties, and composite scoring — addressing issue #449 where tok/s was overestimated for certain models (e.g., Qwen3 30B).

All changes are applied immediately and the model table is recalculated. Close with Esc to accept or Ctrl-R to reset to defaults.

Field	Description	Default
Efficiency	Global efficiency factor for bandwidth-based TPS. Accounts for overhead	`0.55`
GPU factor	Speed multiplier for pure GPU inference	`1.0`
CPU Offload	Speed multiplier when weights spill to system RAM	`0.5`
MoE Offload	Speed multiplier for Mixture-of-Experts expert switching	`0.8`
Tensor Par	Speed multiplier for tensor-parallel inference	`0.9`
CPU Only	Speed multiplier for CPU-only execution	`0.3`
Context cap	Max context length used for memory estimation (leave blank for default)	`auto`

Key	Action
`Tab` / `j` / `k`	Switch between fields
Type digits / `.`	Edit the selected field
`Left` / `Right`	Move cursor within the field
`Backspace` / `Delete`	Remove characters
`Ctrl-U`	Clear the current field
`Enter`	Apply changes and recalculate all scores
`Esc` / `q`	Close without applying

Download Manager (`D`)

Press D to open the Download Manager view. This full-screen view replaces the main model table and provides three sections:

Active Download — shows the current download in progress with a progress bar, model name, and status message.
Config — displays (and allows editing) the GGUF models directory. The configured path persists across sessions.
History — a navigable list of past downloads (newest first) with model name, provider, status, and date. Failed downloads can be removed from history, and successful downloads can be deleted from the provider.

Use Tab / Shift-Tab to cycle focus between sections.

Key	Action
`Tab` / `Shift-Tab`	Cycle focus: Active → Config → History
`j` / `k` or arrows	Navigate the history list (when History focused)
`x`	Delete selected model (prompts for confirmation)
`y` / `n`	Confirm or cancel deletion
`e`	Edit download directory (when Config focused)
`Enter`	Confirm directory edit
`Esc` / `D` / `q`	Close and return to the model table

For failed downloads (e.g. 404 errors), x removes the entry from history. For successful downloads, it deletes the model from the provider (supported for Ollama and llama.cpp).

Community Leaderboard (`b`)

Press b to open the Community Leaderboard view. Instead of relying solely on llmfit's theoretical speed estimates, this view shows real-world performance data from other users with the same hardware — actual measured tok/s, time-to-first-token, and peak VRAM usage.

Data is sourced from localmaxxing.com, a community benchmark database. When you open the view, llmfit auto-detects your hardware (GPU model, VRAM tier, Apple Silicon chip family, OS) and queries for matching results.

Column	Description
Model	HuggingFace model ID
Engine	Inference runtime used (llama.cpp, vLLM, Ollama, MLX...)
Quant	Quantization format (Q4_K_M, Q8_0, etc.)
tok/s	Measured output token generation speed
Total t/s	Total throughput (prompt + generation)
TTFT	Time to first token (latency)
VRAM	Peak memory usage during inference
Ctx	Context length used in the benchmark
User	Submitter (verified users marked with `*`)

Key	Action
`j` / `k` or arrows	Navigate results
`H`	Open hardware picker (browse any GPU)
`r`	Refresh / re-fetch from API
`b` / `q` / `Esc`	Close and return to model table

Press H to open the hardware picker — a scrollable list of 27 popular GPUs and chips (RTX 5090 through CPU-only, plus Apple Silicon M1–M4 variants, AMD RX/MI series, and NVIDIA datacenter cards). Select one to instantly load benchmarks for that hardware, even if it's not what you're running on. Select "My Hardware (auto-detect)" to go back to your own system.

API key setup

Public benchmarks work without authentication. For full access, provide your localmaxxing.com API key:

# Via environment variable (recommended)
export LOCALMAXXING_API_KEY="bhk_your_key_here"
llmfit

# Or via CLI flag
llmfit --api-key "bhk_your_key_here"

Variable	Description
`LOCALMAXXING_API_KEY`	Bearer token for localmaxxing.com API

Inference Bench (`I`)

Press I (uppercase) to open the Inference Bench view. This runs live inference benchmarks against your locally running providers — Ollama, vLLM, and MLX — measuring time-to-first-token (TTFT), tokens per second (TPS), and total latency with real inference requests.

Unlike the Community Leaderboard (which shows crowd-sourced data from other users), Inference Bench measures your actual hardware with your actual models.

TUI usage

Key	Action
`I`	Open inference bench (auto-detects provider and runs benchmarks)
`I` (again)	Rerun benchmarks from within the bench view
`j` / `k` or arrows	Navigate model results
`Enter`	Open detail view for selected model
`r`	Switch to routing matrix view
`q` / `Esc`	Close bench view

Results are cached to ~/.config/llmfit/bench-cache.json and loaded instantly on subsequent opens.

CLI usage

# Auto-detect provider and benchmark
llmfit bench

# Benchmark all discovered models across all running providers
llmfit bench --all

# Benchmark a specific model via Ollama
llmfit bench --provider ollama llama3.2

# Override endpoint URL
llmfit bench --provider ollama --url http://my-server:11434 llama3.2

# Override vLLM endpoint
llmfit bench --provider vllm --url http://localhost:8000

# Output as JSON (for scripting)
llmfit bench --json

# Run quality benchmarks (role-based scoring for routing)
llmfit bench --quality

# Output routing matrix
llmfit bench --quality --routing

Environment variables

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama API base URL
`VLLM_PORT`	`8000`	vLLM server port (used as `http://localhost:$VLLM_PORT`)

Themes

Press t to cycle through 10 built-in color themes. Your selection is saved automatically to ~/.config/llmfit/theme and restored on next launch.

Theme	Description
Default	Original llmfit colors
Dracula	Dark purple background with pastel accents
Solarized	Ethan Schoonover's Solarized Dark palette
Nord	Arctic, cool blue-gray tones
Monokai	Monokai Pro warm syntax colors
Gruvbox	Retro groove palette with warm earth tones
Catppuccin Latte	🌻 Light theme — harmonious pastel inversion
Catppuccin Frappé	🪴 Low-contrast dark — muted, subdued aesthetic
Catppuccin Macchiato	🌺 Medium-contrast dark — gentle, soothing tones
Catppuccin Mocha	🌿 Darkest variant — cozy with color-rich accents

Web dashboard

When you run llmfit in non-JSON mode, it automatically starts a background web dashboard on 0.0.0.0:8787. Open it in any browser on the same network:

http://<your-machine-ip>:8787

Override the host or port with environment variables:

LLMFIT_DASHBOARD_HOST=0.0.0.0 LLMFIT_DASHBOARD_PORT=9000 llmfit

Variable	Default	Description
`LLMFIT_DASHBOARD_HOST`	`0.0.0.0`	Interface to bind the dashboard server
`LLMFIT_DASHBOARD_PORT`	`8787`	Port to bind the dashboard server

To disable the auto-started dashboard, pass --no-dashboard:

llmfit --no-dashboard

CLI mode

Use --cli or any subcommand to get classic table output:

# Table of all models ranked by fit
llmfit --cli

# Only perfectly fitting models, top 5
llmfit fit --perfect -n 5

# Show detected system specs
llmfit system

# List all models in the database
llmfit list

# Search by name, provider, or size
llmfit search "llama 8b"

# Detailed view of a single model
llmfit info "Mistral-7B"

# Top 5 recommendations (JSON, for agent/script consumption)
llmfit recommend --json --limit 5

# Recommendations filtered by use case
llmfit recommend --json --use-case coding --limit 3

# Force a specific runtime (bypass automatic MLX selection on Apple Silicon)
llmfit recommend --force-runtime llamacpp
llmfit recommend --force-runtime llamacpp --use-case coding --limit 3

# Plan required hardware for a specific model configuration
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json

# Run as a node-level REST API (for cluster schedulers / aggregators)
llmfit serve --host 0.0.0.0 --port 8787

REST API (`llmfit serve`)

llmfit serve starts an HTTP API that exposes the same fit/scoring data used by TUI/CLI, including filtering and top-model selection for a node.

# Liveness
curl http://localhost:8787/health

# Node hardware info
curl http://localhost:8787/api/v1/system

# Full fit list with filters
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

# Key scheduling endpoint: top runnable models for this node
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

# Search by model name/provider text
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"

Supported query params for models/models/top:

limit (or n): max number of rows returned
perfect: true|false (forces perfect-only when true)
min_fit: perfect|good|marginal|too_tight
runtime: any|mlx|llamacpp
use_case: general|coding|reasoning|chat|multimodal|embedding
provider: provider text filter (substring)
search: free-text filter across name/provider/size/use-case
sort: score|tps|params|mem|ctx|date|use_case
include_too_tight: include non-runnable rows (default false on /top, true on /models)
max_context: per-request context cap for memory estimation
force_runtime: mlx|llamacpp|vllm — override automatic runtime selection during analysis

Validate API behavior locally:

# spawn server automatically and run endpoint/schema/filter assertions
python3 scripts/test_api.py --spawn

# or test an already-running server
python3 scripts/test_api.py --base-url http://127.0.0.1:8787

Hardware overrides

Hardware autodetection can fail on some systems (e.g. broken nvidia-smi, VMs, passthrough setups), or you may want to evaluate model fit against different target hardware. Use --memory, --ram, and --cpu-cores to override detected values:

# Override GPU VRAM
llmfit --memory=32G

# Override system RAM
llmfit --ram=128G

# Override CPU core count
llmfit --cpu-cores=16

# Combine overrides to simulate target hardware
llmfit --memory=24G --ram=64G --cpu-cores=8 fit
llmfit --memory=24G --ram=64G system --json

# Works with all modes: TUI, CLI, and subcommands
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --ram=64G recommend --json

Accepted suffixes for --memory and --ram: G/GB/GiB (gigabytes), M/MB/MiB (megabytes), T/TB/TiB (terabytes). Case-insensitive. If no GPU was detected, --memory creates a synthetic GPU entry so models are scored for GPU inference. On unified-memory systems (Apple Silicon), --ram also updates VRAM; use --memory to override VRAM independently.

Context-length cap for estimation

Use --max-context to cap context length used for memory estimation (without changing each model's advertised maximum context):

# Estimate memory fit at 4K context
llmfit --max-context 4096 --cli

# Works with subcommands
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5

If --max-context is not set, llmfit will use OLLAMA_CONTEXT_LENGTH when available.

JSON output

Add --json to any subcommand for machine-readable output:

llmfit --json system     # Hardware specs as JSON
llmfit --json fit -n 10  # Top 10 fits as JSON
llmfit recommend --json  # Top 5 recommendations (JSON is default for recommend)
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json

plan JSON includes stable fields for:

request (context, quantization, target_tps)
estimated minimum/recommended hardware
per-path feasibility (gpu, cpu_offload, cpu_only)
upgrade deltas

How it works

Hardware detection -- Reads total/available RAM via sysinfo, counts CPU cores, and probes for GPUs:
- NVIDIA -- Multi-GPU support via nvidia-smi. Aggregates VRAM across all detected GPUs. Falls back to VRAM estimation from GPU model name if reporting fails.
- AMD -- Detected via rocm-smi.
- Intel Arc -- Discrete VRAM via sysfs, integrated via lspci.
- Apple Silicon -- Unified memory via system_profiler. VRAM = system RAM.
- Ascend -- Detected via npu-smi.
- Backend detection -- Automatically identifies the acceleration backend (CUDA, Metal, ROCm, SYCL, CPU ARM, CPU x86, Ascend) for speed estimation.
Model database -- Hundreds models sourced from the HuggingFace API, stored in data/hf_models.json and embedded at compile time. Memory requirements are computed from parameter counts across a quantization hierarchy (Q8_0 through Q2_K). VRAM is the primary constraint for GPU inference; system RAM is the fallback for CPU-only execution.

MoE support -- Models with Mixture-of-Experts architectures (Mixtral, DeepSeek-V2/V3) are detected automatically. Only a subset of experts is active per token, so the effective VRAM requirement is much lower than total parameter count suggests. For example, Mixtral 8x7B has 46.7B total parameters but only activates ~12.9B per token, reducing VRAM from 23.9 GB to ~6.6 GB with expert offloading.
Dynamic quantization -- Instead of assuming a fixed quantization, llmfit tries the best quality quantization that fits your hardware. It walks a hierarchy from Q8_0 (best quality) down to Q2_K (most compressed), picking the highest quality that fits in available memory. If nothing fits at full context, it tries again at half context.

Multi-dimensional scoring -- Each model is scored across four dimensions (0–100 each):

Dimension	What it measures
Quality	Parameter count, model family reputation, quantization penalty, task alignment
Speed	Estimated tokens/sec based on backend, params, and quantization
Fit	Memory utilization efficiency (sweet spot: 50–80% of available memory)
Context	Context window capability vs target for the use case

Dimensions are combined into a weighted composite score. Weights vary by use-case category (General, Coding, Reasoning, Chat, Multimodal, Embedding). For example, Chat weights Speed higher (0.35) while Reasoning weights Quality higher (0.55). Models are ranked by composite score, with unrunnable models (Too Tight) always at the bottom.

Speed estimation -- Token generation in LLM inference is memory-bandwidth-bound: each token requires reading the full model weights once from VRAM. When the GPU model is recognized, llmfit uses its actual memory bandwidth to estimate throughput:

Formula: (bandwidth_GB_s / model_size_GB) × efficiency_factor

The efficiency factor (0.55) and per-mode speed multipliers are tunable via the Advanced Configuration popup (A in the TUI). The defaults account for kernel overhead, KV-cache reads, and memory controller effects. This approach is validated against published benchmarks from llama.cpp (Apple Silicon, NVIDIA T4) and real-world measurements.

The bandwidth lookup table covers ~80 GPUs across NVIDIA (consumer + datacenter), AMD (RDNA + CDNA), and Apple Silicon families.

For unrecognized GPUs, llmfit falls back to per-backend speed constants:

Backend Speed constant

CUDA 220

Metal 160

ROCm 180

SYCL 100

CPU (ARM) 90

CPU (x86) 70

NPU (Ascend) 390

Fallback formula: K / params_b × quant_speed_multiplier, with per-mode penalties tunable via the Advanced Configuration popup (A in the TUI).
Fit analysis -- Each model is evaluated for memory compatibility:

Run modes:
- GPU -- Model fits in VRAM. Fast inference.
- MoE -- Mixture-of-Experts with expert offloading. Active experts in VRAM, inactive in RAM.
- CPU+GPU -- VRAM insufficient, spills to system RAM with partial GPU offload.
- CPU -- No GPU. Model loaded entirely into system RAM.
Fit levels:
- Perfect -- Recommended memory met on GPU. Requires GPU acceleration.
- Good -- Fits with headroom. Best achievable for MoE offload or CPU+GPU.
- Marginal -- Tight fit, or CPU-only (CPU-only always caps here).
- Too Tight -- Not enough VRAM or system RAM anywhere.

Backend	Speed constant
CUDA	220
Metal	160
ROCm	180
SYCL	100
CPU (ARM)	90
CPU (x86)	70
NPU (Ascend)	390

Model database

The model list is generated by scripts/scrape_hf_models.py, a standalone Python script (stdlib only, no pip dependencies) that queries the HuggingFace REST API. Hundreds models & providers including Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, IBM Granite, Allen Institute OLMo, xAI Grok, Cohere, BigCode, 01.ai, Upstage, TII Falcon, HuggingFace, Zhipu GLM, Moonshot Kimi, Baidu ERNIE, and more. The scraper automatically detects MoE architectures via model config (num_local_experts, num_experts_per_tok) and known architecture mappings.

Model categories span general purpose, coding (CodeLlama, StarCoder2, WizardCoder, Qwen2.5-Coder, Qwen3-Coder), reasoning (DeepSeek-R1, Orca-2), multimodal/vision (Llama 3.2 Vision, Llama 4 Scout/Maverick, Qwen2.5-VL), chat, enterprise (IBM Granite), and embedding (nomic-embed, bge).

See MODELS.md for the full list.

The model database is embedded at compile time, so end users get updates by upgrading llmfit itself (brew upgrade llmfit, scoop update llmfit, or downloading a newer release). The commands below are for contributors refreshing the database from source:

To refresh the model database:

# Automated update (recommended)
make update-models

# Or run the script directly
./scripts/update_models.sh

# Or manually
python3 scripts/scrape_hf_models.py
cargo build --release

The scraper writes data/hf_models.json, which is baked into the binary via include_str!. The automated update script backs up existing data, validates JSON output, and rebuilds the binary.

By default, the scraper enriches models with known GGUF download sources from providers like unsloth and bartowski. Results are cached in data/gguf_sources_cache.json (7-day TTL) to avoid repeated API calls. Use --no-gguf-sources to skip enrichment for a faster scrape.

Project structure

src/
  main.rs         -- CLI argument parsing, entrypoint, TUI launch
  hardware.rs     -- System RAM/CPU/GPU detection (multi-GPU, backend identification)
  models.rs       -- Model database, quantization hierarchy, dynamic quant selection
  fit.rs          -- Multi-dimensional scoring (Q/S/F/C), speed estimation, MoE offloading
  providers.rs    -- Runtime provider integration (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio), install detection, pull/download
  display.rs      -- Classic CLI table rendering + JSON output
  tui_app.rs      -- TUI application state, filters, navigation
  tui_ui.rs       -- TUI rendering (ratatui)
  tui_events.rs   -- TUI keyboard event handling (crossterm)
data/
  hf_models.json  -- Model database (206 models)
skills/
  llmfit-advisor/ -- OpenClaw skill for hardware-aware model recommendations
scripts/
  scrape_hf_models.py        -- HuggingFace API scraper
  update_models.sh            -- Automated database update script
  install-openclaw-skill.sh   -- Install the OpenClaw skill
Makefile           -- Build and maintenance commands

Publishing to crates.io

The Cargo.toml already includes the required metadata (description, license, repository). To publish:

# Dry run first to catch issues
cargo publish --dry-run

# Publish for real (requires a crates.io API token)
cargo login
cargo publish

Before publishing, make sure:

The version in Cargo.toml is correct (bump with each release).
A LICENSE file exists in the repo root. Create one if missing:

# For MIT license:
curl -sL https://opensource.org/license/MIT -o LICENSE
# Or write your own. The Cargo.toml declares license = "MIT".

data/hf_models.json is committed. It is embedded at compile time and must be present in the published crate.

To publish updates:

# Bump version
# Edit Cargo.toml: version = "0.2.0"
cargo publish

Dependencies

Crate	Purpose
`clap`	CLI argument parsing with derive macros
`sysinfo`	Cross-platform RAM and CPU detection
`serde` / `serde_json`	JSON deserialization for model database
`tabled`	CLI table formatting
`colored`	CLI colored output
`ureq`	HTTP client for runtime/provider API integration
`ratatui`	Terminal UI framework
`crossterm`	Terminal input/output backend for ratatui

Runtime provider integration

llmfit supports multiple local runtime providers:

Ollama (daemon/API based pulls)
llama.cpp (direct GGUF downloads from Hugging Face + local cache detection)
MLX (Apple Silicon / mlx-community model cache + optional server) — MLX downloads map to mlx-community/* repos on HuggingFace, not the original model publisher
Docker Model Runner (Docker Desktop's built-in model serving)
LM Studio (local model server with REST API for model management + downloads)

When more than one compatible provider is available for a model, pressing d in the TUI opens a provider picker modal.

Ollama integration

llmfit integrates with Ollama to detect which models you already have installed and to download new ones directly from the TUI.

Requirements

Ollama must be installed and running (ollama serve or the Ollama desktop app)
llmfit connects to http://localhost:11434 (Ollama's default API port)
No configuration needed — if Ollama is running, llmfit detects it automatically

Remote Ollama instances

To connect to Ollama running on a different machine or port, set the OLLAMA_HOST environment variable:

# Connect to Ollama on a specific IP and port
OLLAMA_HOST="http://192.168.1.100:11434" llmfit

# Connect via hostname  
OLLAMA_HOST="http://ollama-server:666" llmfit

# Works with all TUI and CLI commands
OLLAMA_HOST="http://192.168.1.100:11434" llmfit --cli
OLLAMA_HOST="http://192.168.1.100:11434" llmfit fit --perfect -n 5

This is useful for:

Running llmfit on one machine while Ollama serves from another (e.g., GPU server + laptop client)
Connecting to Ollama running in Docker containers with custom ports
Using Ollama behind reverse proxies or load balancers

How it works

On startup, llmfit queries GET /api/tags to list your installed Ollama models. Each installed model gets a green ✓ in the Inst column of the TUI. The system bar shows Ollama: ✓ (N installed).

When you press d on a model, llmfit sends POST /api/pull to Ollama to download it. The row highlights with an animated progress indicator showing download progress in real-time. Once complete, the model is immediately available for use with Ollama.

If Ollama is not running, Ollama-specific operations are skipped; the TUI still supports other providers like llama.cpp where available.

llama.cpp integration

llmfit integrates with llama.cpp as a runtime/download provider in both TUI and CLI.

Requirements:

llama-cli or llama-server available in PATH (for runtime detection)
network access to Hugging Face for GGUF downloads

How it works:

llmfit maps HF models to known GGUF repos (with heuristic fallbacks)
downloads GGUF files into the local llama.cpp model cache
marks models installed when matching GGUF files are present locally

Environment variables

Variable	Default	Description
`LLAMA_CPP_PATH`	(none)	Directory containing llama.cpp binaries (`llama-cli`, `llama-server`). Checked before `PATH` lookup.
`LLAMA_SERVER_PORT`	`8080`	Port used when probing a running `llama-server` health endpoint for runtime detection.

If llama.cpp is installed in a non-standard location, set LLAMA_CPP_PATH so llmfit can find it without requiring it in your PATH.

Docker Model Runner integration

llmfit integrates with Docker Model Runner, Docker Desktop's built-in model serving feature.

Requirements:

Docker Desktop with Model Runner enabled
Default endpoint: http://localhost:12434

How it works:

llmfit queries GET /engines to list models available in Docker Model Runner
models are matched to the HF database using Ollama-style tag mapping (Docker Model Runner uses ai/<tag> naming)
pressing d in the TUI pulls via docker model pull

Remote Docker Model Runner instances

To connect to Docker Model Runner on a different host or port, set the DOCKER_MODEL_RUNNER_HOST environment variable:

DOCKER_MODEL_RUNNER_HOST="http://192.168.1.100:12434" llmfit

LM Studio integration

llmfit integrates with LM Studio as a local model server with built-in model download capabilities.

Requirements:

LM Studio must be running with its local server enabled
Default endpoint: http://127.0.0.1:1234

How it works:

llmfit queries GET /v1/models to list models available in LM Studio
pressing d in the TUI triggers a download via POST /api/v1/models/download
download progress is tracked by polling GET /api/v1/models/download-status
LM Studio accepts HuggingFace model names directly, so no name mapping is needed

Remote LM Studio instances

To connect to LM Studio on a different host or port, set the LMSTUDIO_HOST environment variable:

LMSTUDIO_HOST="http://192.168.1.100:1234" llmfit

API authentication

If your LM Studio instance has Require API Key enabled (required for MCP server access), set the LMSTUDIO_API_KEY environment variable to provide a Bearer token with all requests:

export LMSTUDIO_API_KEY="your-api-key-here"
llmfit

Model name mapping

llmfit's database uses HuggingFace model names (e.g. Qwen/Qwen2.5-Coder-14B-Instruct) while Ollama uses its own naming scheme (e.g. qwen2.5-coder:14b). llmfit maintains an accurate mapping table between the two so that install detection and pulls resolve to the correct model. Each mapping is exact — qwen2.5-coder:14b maps to the Coder model, not the base qwen2.5:14b.

Platform support

Linux -- Full support. GPU detection via nvidia-smi (NVIDIA), rocm-smi (AMD), sysfs/lspci (Intel Arc) and npu-smi (Ascend).
macOS (Apple Silicon) -- Full support. Detects unified memory via system_profiler. VRAM = system RAM (shared pool). Models run via Metal GPU acceleration.
macOS (Intel) -- RAM and CPU detection works. Discrete GPU detection if nvidia-smi available.
Windows -- RAM and CPU detection works. NVIDIA GPU detection via nvidia-smi if installed.
Android / Termux / PRoot -- CPU and RAM detection usually work, but GPU autodetection is not currently supported. Mobile GPUs such as Adreno typically are not visible through the desktop/server probing interfaces llmfit uses.

GPU support

Vendor	Detection method	VRAM reporting
NVIDIA	`nvidia-smi`	Exact dedicated VRAM
AMD	`rocm-smi`	Detected (VRAM may be unknown)
Intel Arc (discrete)	sysfs (`mem_info_vram_total`)	Exact dedicated VRAM
Intel Arc (integrated)	`lspci`	Shared system memory
Apple Silicon	`system_profiler`	Unified memory (= system RAM)
Ascend	`npu-smi`	Detected (VRAM may be unknown)

If autodetection fails or reports incorrect values, use --memory, --ram, or --cpu-cores to override (see Hardware overrides above).

Android / Termux note

On Android setups such as Termux + PRoot, llmfit usually cannot see mobile GPUs through the standard Linux detection paths (nvidia-smi, rocm-smi, DRM/sysfs, lspci, etc.). In those environments, "no GPU detected" is expected with the current implementation.

If you still want GPU-style recommendations on a unified-memory phone or tablet, use a manual memory override:

llmfit --memory=8G fit -n 20
llmfit recommend --json --memory=8G --limit 10

This is a workaround for recommendation/scoring only; it does not provide true Android GPU runtime detection.

Contributing

Contributions are welcome, especially new models.

Before submitting a PR

Please run cargo fmt before pushing your changes. Most CI check failures are caused by unformatted code:

cargo fmt

Adding a model

Add the model's HuggingFace repo ID (e.g., meta-llama/Llama-3.1-8B) to the TARGET_MODELS list in scripts/scrape_hf_models.py.
If the model is gated (requires HuggingFace authentication to access metadata), add a fallback entry to the FALLBACKS list in the same script with the parameter count and context length.

Run the automated update script:

make update-models
# or: ./scripts/update_models.sh

Verify the updated model list: ./target/release/llmfit list
Update MODELS.md by running: python3 << 'EOF' < scripts/... (see commit history for the generator script)
Open a pull request.

See MODELS.md for the current list and AGENTS.md for architecture details.

OpenClaw integration

llmfit ships as an OpenClaw skill that lets the agent recommend hardware-appropriate local models and auto-configure Ollama/vLLM/LM Studio providers.

Install the skill

# From the llmfit repo
./scripts/install-openclaw-skill.sh

# Or manually
cp -r skills/llmfit-advisor ~/.openclaw/skills/

Once installed, ask your OpenClaw agent things like:

"What local models can I run?"
"Recommend a coding model for my hardware"
"Set up Ollama with the best models for my GPU"

The agent will call llmfit recommend --json under the hood, interpret the results, and offer to configure your openclaw.json with optimal model choices.

How it works

The skill teaches the OpenClaw agent to:

Detect your hardware via llmfit --json system
Get ranked recommendations via llmfit recommend --json
Map HuggingFace model names to Ollama/vLLM/LM Studio tags
Configure models.providers.ollama.models in openclaw.json

See skills/llmfit-advisor/SKILL.md for the full skill definition.

Alternatives

If you're looking for a different approach, check out llm-checker -- a Node.js CLI tool with Ollama integration that can pull and benchmark models directly. It takes a more hands-on approach by actually running models on your hardware via Ollama, rather than estimating from specs. Good if you already have Ollama installed and want to test real-world performance. Note that it doesn't support MoE (Mixture-of-Experts) architectures -- all models are treated as dense, so memory estimates for models like Mixtral or DeepSeek-V3 will reflect total parameter count rather than the smaller active subset.

from https://github.com/AlexsJones/llmfit

Total Pageviews

Wednesday, 1 July 2026

搭建基于nextjs的静态博客程序zabs

离线语音输入法: CapsWriter-Offline

✨ 核心特性

💻 平台支持

🎬 快速开始

⚙️ 个性化配置

🛠️ 常见问题

🚀 我的其他优质项目推荐

❤️ 致谢

smolvm

Install

Quick Start

Use This For

How It Works

Comparison

Platform Support

Known Limitations

GPU Acceleration

Host requirements

Usage

Headless browser example

Development

llmfit

Install

Windows

macOS / Linux

Homebrew

MacPorts

Quick install

uv / pip

Docker / Podman

From source

Usage

TUI (default)

Vim-like modes

Normal mode

Visual mode (v)

Select mode (V)

TUI Plan mode (p)

Hardware Simulation (S)

Advanced Configuration (A)

Download Manager (D)

Community Leaderboard (b)

API key setup

Inference Bench (I)

TUI usage

CLI usage

Environment variables

Themes

Web dashboard

CLI mode

REST API (llmfit serve)

Hardware overrides

Context-length cap for estimation

JSON output

How it works

Model database

Project structure

Publishing to crates.io

Dependencies

Runtime provider integration

Ollama integration

Requirements

Remote Ollama instances

How it works

llama.cpp integration

Environment variables

Docker Model Runner integration

Remote Docker Model Runner instances

LM Studio integration

Remote LM Studio instances

API authentication

Model name mapping

Platform support

GPU support

Android / Termux note

Contributing

Before submitting a PR

Visual mode (`v`)

Select mode (`V`)

TUI Plan mode (`p`)

Hardware Simulation (`S`)

Advanced Configuration (`A`)

Download Manager (`D`)

Community Leaderboard (`b`)

Inference Bench (`I`)

REST API (`llmfit serve`)