Skip to main content
LL

Run local LLMs as a single executable file

llamafile packages an LLM and runtime into one file you can download and run locally. It is aimed at developers and end users who want offline, no-install model execution across common operating systems.

Desktop
B2B
CLI
Self-Hosted
On-Device / Edge
Model Agnostic
Supports Local Models
Visit llamafile

Is this your tool? Claim this listing to manage your content and analytics.

Recent activity

What's happened with llamafile lately

  • Score change
    Rubric upgrade v3_0 → v3.1: score 1/32 → 2/3612/36(+1)

    Rubric upgrade: agenticness v3.0 (8 dims, /32) → v3.1 (9 dims, /36). Adds Dim 9 (Operator Sovereignty), splits Dim 6 into 6a/6b lenses, tightens Dim 4 autonomous-retry distinction. Not a product change — score shift reflects new dimension + recalibrated rubric, not a change in the tool. Fanout suppressed.

    See the news that prompted this

News mentions sourced from our news feed; score changes from periodic re-evaluations.

Ask about llamafile

Get answers based on llamafile's actual documentation

Try asking:

About

What It Is

llamafile is an open-source tool for distributing and running large language models as a single executable file. It combines llama.cpp with Cosmopolitan Libc so you can run supported models locally on most operating systems and CPU architectures without a traditional install step.

What to Know

This is not an autonomous agent platform. It is primarily a local model runtime and packaging format, so its value is in portability and ease of execution rather than multi-step task automation. The project says newer versions support more recent...

Key Features
Packages an LLM into a single executable file
Runs locally on macOS, Linux, BSD, Windows, and other supported systems
Works without a separate installation process
Uses llama.cpp as the model runtime foundation
Includes whisperfile for local speech-to-text transcription and translation
Use Cases
Running a local chat model on your laptop without setting up a full environment
Sharing a model as a single downloadable file for teammates or users
Testing LLM behavior offline across different operating systems
Agenticness: Reactive Tool

Responds to prompts but takes no autonomous action.

High evidence
Last evaluated: May 23, 2026

Dimension Breakdown

Action Capability
Autonomy
Adaptation
State & Memory
Safety

Categories

Pricing
  • Free / open source — full functionality available at no cost.
Details
AddedMarch 31, 2026
RefreshedMarch 31, 2026
Agenticness
Quick Facts
DeploymentOn-device / local
AutonomyCopilot (human-in-loop)
Model supportSupports local models
Open sourceYes
Team supportIndividual only
Pricing modelFree / open source
Interfacecli, api
Similar tools

Related Tools

BuyWhere gives developers a normalized product catalog API for Singapore and Southeast Asia. It helps AI agents search, compare, and route commerce queries without scraping storefronts.

Free Tier
API
Chrome Extension
+4

Runloop AI provides sandboxed devboxes for agent workflows, including turn-based interaction through GitHub pull requests. It’s aimed at developers building coding agents that need to execute commands, keep state across turns, and respond to reviewer comments.

API
Integrations
B2B
+3

Fireworks AI is a model hosting and inference platform for teams building with open and proprietary models. It covers serverless inference, fine-tuning, embeddings, speech-to-text, and on-demand GPU deployments.

Paid
Enterprise
API
+3

GroqCloud is an AI inference platform for developers that focuses on low latency and predictable spend. It provides API access to text, audio, vision, and image-to-text models, with free, developer, and enterprise plans.

API
For Developers
Usage-Based
+3
Stay in the loop

Get the weekly agentic AI briefing

New tools, top picks, and trends — delivered every Thursday.

I use AI for: