meta-llama/llama3
The official Meta Llama 3 GitHub site
Stale — last commit 1y ago
weakest axisnon-standard license (Other); last commit was 1y ago…
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
last commit was 1y ago; no CI workflows detected
- ✓28+ active contributors
- ✓Distributed ownership (top contributor 29% of recent commits)
- ✓Other licensed
- ✓Tests present
- ⚠Stale — last commit 1y ago
- ⚠Non-standard license (Other) — review terms
- ⚠No CI workflows detected
What would change the summary?
- →Use as dependency Concerns → Mixed if: clarify license terms
- →Deploy as-is Mixed → Healthy if: 1 commit in the last 180 days
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/meta-llama/llama3)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/meta-llama/llama3 on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: meta-llama/llama3
Generated by RepoPilot · 2026-05-07 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/meta-llama/llama3 shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Stale — last commit 1y ago
- 28+ active contributors
- Distributed ownership (top contributor 29% of recent commits)
- Other licensed
- Tests present
- ⚠ Stale — last commit 1y ago
- ⚠ Non-standard license (Other) — review terms
- ⚠ No CI workflows detected
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live meta-llama/llama3
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/meta-llama/llama3.
What it runs against: a local clone of meta-llama/llama3 — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in meta-llama/llama3 | Confirms the artifact applies here, not a fork |
| 2 | License is still Other | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 495 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of meta-llama/llama3. If you don't
# have one yet, run these first:
#
# git clone https://github.com/meta-llama/llama3.git
# cd llama3
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of meta-llama/llama3 and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "meta-llama/llama3(\\.git)?\\b" \\
&& ok "origin remote is meta-llama/llama3" \\
|| miss "origin remote is not meta-llama/llama3 (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
&& ok "license is Other" \\
|| miss "license drift — was Other at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "llama/model.py" \\
&& ok "llama/model.py" \\
|| miss "missing critical file: llama/model.py"
test -f "llama/generation.py" \\
&& ok "llama/generation.py" \\
|| miss "missing critical file: llama/generation.py"
test -f "llama/tokenizer.py" \\
&& ok "llama/tokenizer.py" \\
|| miss "missing critical file: llama/tokenizer.py"
test -f "example_chat_completion.py" \\
&& ok "example_chat_completion.py" \\
|| miss "missing critical file: example_chat_completion.py"
test -f "download.sh" \\
&& ok "download.sh" \\
|| miss "missing critical file: download.sh"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 495 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~465d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/meta-llama/llama3"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
This is the official Meta Llama 3 inference repository—a minimal codebase for loading and running pre-trained/instruction-tuned Llama 3 language models (8B–70B parameters) on local hardware. It provides PyTorch model loading, a custom tokenizer (tiktoken-based), and text/chat completion inference with core classes in llama/model.py and llama/generation.py. Monolithic single-package structure: llama/ contains three core modules—model.py (transformer architecture + weight loading), generation.py (inference/sampling logic), and tokenizer.py (BPE + chat template handling). Top-level example_*.py scripts demonstrate text and chat completion. download.sh handles model artifact retrieval.
👥Who it's for
ML researchers, engineers, and developers who want to run Llama 3 inference locally after downloading model weights from Meta's website. Users need a signed URL from llama.meta.com and are typically building on top of this minimal foundation rather than using it as-is for production.
🌱Maturity & risk
DEPRECATED. This repo is no longer maintained—the README explicitly states to use meta-llama/llama-models, meta-llama/llama-toolchain, and related repos instead (as of Llama 3.1 release). The codebase is stable but frozen; no active development or CI/CD pipeline is visible in the file structure.
High: This repo is officially deprecated and users should migrate to meta-llama/llama-models and companion repos. No test suite beyond llama/test_tokenizer.py, no GitHub Actions CI visible in .github/ISSUE_TEMPLATE (only issue templates), and dependencies are tightly pinned (tiktoken==0.4.0) which may conflict with newer PyTorch versions. Maintenance burden has shifted entirely to the Llama Stack consolidation.
Active areas of work
Nothing—this repository is deprecated as of Llama 3.1. Active development has moved to meta-llama/llama-models (foundation models), meta-llama/llama-toolchain (inference/fine-tuning), and meta-llama/llama-agentic-system. Users are directed away from this repo in the README.
🚀Get running
git clone https://github.com/meta-llama/llama3.git
cd llama3
pip install -r requirements.txt
./download.sh # Provide signed URL from llama.meta.com when prompted
python example_chat_completion.py # Run inference after weights are downloaded
Daily commands:
After weights are downloaded via ./download.sh: python example_chat_completion.py or python example_text_completion.py. Both scripts instantiate model in llama/model.py via load() and call generate() from llama/generation.py. No development server or REPL—execution is script-based.
🗺️Map of the codebase
llama/model.py— Core model architecture and weight loading for Llama 3; essential entry point for understanding how the transformer is initialized and configured.llama/generation.py— Text generation pipeline (sampling, inference loop); primary interface for running the model after loading.llama/tokenizer.py— Token encoding/decoding and vocabulary management; required for all input preprocessing and output post-processing.example_chat_completion.py— Demonstrates the canonical chat workflow; shows how to load model, tokenize, and generate responses with conversation context.download.sh— Model artifact acquisition script; contributors must understand the expected model weights format and directory structure.requirements.txt— Declares all dependencies (torch, fairscale, fire, tiktoken); critical for reproducible environment setup.llama/__init__.py— Package exports and initialization; defines the public API surface that users interact with.
🛠️How to make changes
Add a new sampling strategy
- Define the sampling function (e.g., nucleus_sample, beam_search) in the generation module (
llama/generation.py) - Expose the new strategy as a parameter in the sample() method's interface (
llama/generation.py) - Update the example scripts to demonstrate the new sampling mode (
example_text_completion.py) - Add test cases validating the strategy's output shape and token validity (
llama/test_tokenizer.py)
Optimize model loading for a new hardware platform
- Add device-specific checkpoint loading logic (e.g., multi-GPU sharding) in the model initialization (
llama/model.py) - Extend the quantization or memory-mapping code path if needed (
llama/model.py) - Update download.sh to fetch platform-specific model variants (
download.sh) - Document the new platform in MODEL_CARD.md (
MODEL_CARD.md)
Implement a new chat template or prompt format
- Create a new dialog formatter class in the model or generation module (
llama/generation.py) - Update the Dialog class (or add a new one) to tokenize and format conversations (
llama/model.py) - Demonstrate the new format in example_chat_completion.py with sample conversations (
example_chat_completion.py) - Add round-trip tests to verify tokenization and formatting consistency (
llama/test_tokenizer.py)
Add support for a new quantization scheme
- Implement quantization/dequantization kernels or wrappers (e.g., for INT8, NF4) (
llama/model.py) - Extend model checkpoint loading to detect and apply quantization on load (
llama/model.py) - Update requirements.txt if new dependencies (e.g., bitsandbytes) are needed (
requirements.txt) - Document quantization trade-offs (accuracy, speed, memory) in MODEL_CARD.md (
MODEL_CARD.md)
🔧Why these technologies
- PyTorch — Industry-standard deep learning framework for efficient GPU computation, transformer operations, and distributed inference.
- fairscale — Enables multi-GPU and distributed training/inference with model parallelism and gradient checkpointing for large Llama checkpoints.
- tiktoken — Meta's optimized BPE tokenizer library; ensures compatibility with OpenAI's encoding and provides fast vectorized tokenization.
- fire — Lightweight CLI framework for converting Python functions into command-line interfaces with automatic argument parsing and help.
- blobfile — Unified API for reading files from local disk, GCS, S3, and Azure; used for checkpoint loading from remote storage.
⚖️Trade-offs already made
-
Single Llama class with optional device/dtype parameters rather than separate inference engine classes
- Why: Simplicity and ease of use; developers can instantiate and infer on any device with minimal boilerplate.
- Consequence: Less fine-grained control over memory/compute trade-offs; advanced optimization (e.g., per-layer quantization) requires subclassing or monkey-patching.
-
Autoregressive sampling only (no speculative decoding, no batch generation in examples)
- Why: Easier to understand and debug; single-token decoding loop is the clearest interface.
- Consequence: Lower throughput for batch inference; users must implement their own batching if needed.
-
Checkpoint format tied directly to PyTorch .pth files
- Why: Minimal overhead; leverages PyTorch's native serialization and state_dict conventions.
- Consequence: Coupling to PyTorch version; non-PyTorch inference (ONNX, TorchScript export) not in scope for this repo.
-
Minimal chat templating; examples assume a simple system/user/assistant message format
- Why: Focuses on the core inference pipeline; chat formatting is application-layer concern.
- Consequence: Users must implement their own prompt engineering and multi-turn dialogue management; no built-in conversation memory or context windowing strategies.
🚫Non-goals (don't propose these)
- Does not provide distributed training or fine-tuning code (inference-only).
- Does not implement quantization, pruning, or compression; relies on external tools or user-provided compressed checkpoints.
- Does not handle authentication or model access control; assumes user has legitimate access and downloaded weights.
- Not a real-time or streaming inference service; synchronous blocking calls only (no async/await or streaming output support).
- Does not support ONNX, TensorFlow, JAX, or other non-PyTorch frameworks; PyTorch-only.
- Does not include safety filters, content moderation, or jailbreak detection (see PurpleLlama repo for safety focus).
- Not a chat application or web UI; examples are standalone Python scripts only.
🪤Traps & gotchas
Model weights must be downloaded separately via download.sh and a signed URL from llama.meta.com—they are not in this repo. Strict tiktoken==0.4.0 version pin may break with newer torch builds. No explicit CUDA/device handling in example scripts; users must set CUDA_VISIBLE_DEVICES or handle CPU fallback themselves. Chat completion examples expect models fine-tuned on instruction data; base models will behave differently. No error handling for OOM on <80GB VRAM systems.
🏗️Architecture
💡Concepts to learn
- Transformer Decoder Architecture — Understanding the Multi-Head Attention, FFN, RMSNorm layers in llama/model.py is critical to modify or debug inference behavior.
- Rotary Position Embeddings (RoPE) — Llama 3 uses RoPE instead of absolute positional embeddings for better length extrapolation; implemented in llama/model.py Attention class.
- Byte-Pair Encoding (BPE) Tokenization — llama/tokenizer.py wraps tiktoken's BPE implementation; understanding token IDs and vocab size is essential for prompting and embedding interpretation.
- Grouped Query Attention (GQA) — Llama 3 uses GQA to reduce KVCache memory during inference; llama/generation.py and model.py implement this optimization for faster decoding.
- Top-K and Top-P (Nucleus) Sampling — Core sampling strategies in llama/generation.py to control generation diversity; understanding these is key to tuning model outputs for your use case.
- KVCache (Key-Value Cache) — llama/generation.py maintains KVCache to avoid recomputing attention on already-processed tokens; critical for inference latency on long sequences.
🔗Related repos
meta-llama/llama-models— Official successor repo consolidating Llama foundation models, cards, and utilities—this deprecated repo directs users here.meta-llama/llama-toolchain— Canonical inference/fine-tuning/safety interfaces for Llama; replaces the minimal generation.py and model.py in this repo with production-grade alternatives.meta-llama/PurpleLlama— Safety and risk mitigation for Llama inference; essential companion for responsible deployment mentioned in README migration guide.meta-llama/llama-agentic-system— E2E Llama Stack for building agentic applications; the recommended path for production systems rather than using this minimal repo.facebookresearch/llama-recipes— Community examples and integrations (fine-tuning, quantization, RAG); mentioned in README as detailed extension beyond this minimal codebase.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for llama/tokenizer.py
The repo includes llama/test_tokenizer.py but it's unclear if it has full coverage of tokenizer.py functionality. Given that tokenizer.py is critical for preprocessing inputs and the repo depends on tiktoken==0.4.0, adding thorough unit tests would ensure reliability across different input types, edge cases, and token boundary conditions. This is especially important since tokenization directly impacts model inference quality.
- [ ] Examine llama/test_tokenizer.py to identify gaps in current test coverage
- [ ] Add tests for edge cases: empty strings, special tokens, multi-language inputs, extremely long sequences
- [ ] Add tests for tokenizer initialization and encoding/decoding round-trip consistency
- [ ] Ensure tests verify compatibility with tiktoken==0.4.0 specifically
- [ ] Run tests locally and add docstrings explaining test purpose
Add integration tests for llama/generation.py with example scripts
The repo provides example_chat_completion.py and example_text_completion.py but lacks automated integration tests that validate the end-to-end generation pipeline. Adding tests that verify generation.py works correctly with model.py and handles various sampling parameters (temperature, top_p, etc.) would catch regressions and ensure examples stay functional.
- [ ] Create llama/test_generation.py with integration tests
- [ ] Add tests that validate chat completion generation against known outputs
- [ ] Add tests for text completion generation with different temperature/top_p settings
- [ ] Add tests verifying model loading and generation pipeline from example_chat_completion.py
- [ ] Document any required model files or download steps needed for CI
Add GitHub Actions workflow for Python dependency and import validation
The repo has requirements.txt and setup.py but no automated CI to validate that dependencies stay compatible and all imports work. Since this is a foundational model repo with specific dependency versions (tiktoken==0.4.0, torch, fairscale), a simple GitHub Action workflow would catch breakages early when contributors add code that imports new packages or when dependencies become incompatible.
- [ ] Create .github/workflows/python-validation.yml
- [ ] Add job to install requirements.txt and validate all imports in llama/init.py, llama/model.py, llama/generation.py, llama/tokenizer.py
- [ ] Add job to verify setup.py syntax and that it matches requirements.txt
- [ ] Add job to run existing tests in llama/test_tokenizer.py
- [ ] Ensure workflow runs on pull requests to catch issues before merge
🌿Good first issues
- Add unit tests for llama/tokenizer.py beyond test_tokenizer.py—test edge cases like BOS/EOS token handling, special token escaping, and multi-turn chat template formatting.
- Add device/dtype handling utilities: create a llama/utils.py with helpers to auto-detect VRAM, quantize models to int8/int4, and handle multi-GPU inference (repo lacks distributed example).
- Document model card details in README: add a table comparing Llama 3 8B vs 70B parameter counts, context windows, and inference speed on standard hardware (A100, H100, consumer GPUs).
⭐Top contributors
Click to expand
- @jspisak — 29 commits
- @rohit-ptl — 8 commits
- @astonzhang — 8 commits
- @aakashapoorv — 7 commits
- @xingjia01 — 6 commits
📝Recent commits
Click to expand
a0940f9— Update README.md (amitsangani)11817d4— Add note about 3.1 download (subramen)9b60364— Merge pull request #292 from Vasanthrs-dev/patch-2 (jspisak)c7c6c2e— Update eval_details.md (Vasanthrs-dev)18f515a— Update README.md (jspisak)e14519c— Update MODEL_CARD.md (jspisak)d3eca21— Merge pull request #141 from pchng/main (subramen)bf8d18c— Merge pull request #135 from aakashapoorv/update-examples (subramen)f2bb4c5— add newline at end of file (aakashapoorv)bed1106— Merge remote-tracking branch 'refs/remotes/origin/update-examples' into update-examples (aakashapoorv)
🔒Security observations
The codebase shows moderate security posture with primary concerns around dependency management and maintenance. The repository is deprecated with active development moved elsewhere, which poses a risk for users not following migration guidance. Main vulnerabilities are: (1) pinned older version of tiktoken without version constraints on other dependencies, (2) lack of documented security practices, and (3) deprecation status without clear migration path for existing users. The code itself lacks visible injection vulnerabilities or exposed credentials. Recommend updating dependencies, implementing proper version management, and following the official migration to maintained repositories.
- Medium · Pinned Dependency with Known Vulnerabilities —
requirements.txt. tiktoken==0.4.0 is pinned to a specific older version. This version may contain known security vulnerabilities. The package should be kept up-to-date to receive security patches. Fix: Update tiktoken to the latest stable version. Use 'tiktoken>=0.5.0' or specify the latest version available. Regularly audit dependencies for CVEs. - Medium · Unspecified Dependency Versions —
requirements.txt. Dependencies torch, fairscale, fire, and blobfile lack version pinning or constraints. This creates supply chain risk and reproducibility issues. Unspecified versions could pull vulnerable or breaking changes. Fix: Pin all dependencies to specific versions or use version ranges (e.g., torch>=2.0.0,<3.0.0). Maintain a separate constraints file for production deployments. Use tools like pip-audit or safety to scan for vulnerabilities. - Low · Repository Deprecation - Active Development Hazard —
README.md. The repository is deprecated as of Llama 3.1 release. New features and security patches are being consolidated into other repositories (llama-models, PurpleLlama). Using this repo for new projects may miss critical security updates. Fix: Migrate to the official active repositories: https://github.com/meta-llama/llama-models and https://github.com/meta-llama/PurpleLlama for ongoing security maintenance and updates. - Low · Missing Security Headers and Input Validation Documentation —
example_chat_completion.py, example_text_completion.py. The example scripts (example_chat_completion.py, example_text_completion.py) may process user input without documented validation. No visible input sanitization patterns in the file structure. Fix: Implement input validation and sanitization for all user-provided inputs. Document security best practices in README. Add examples of safe input handling. - Low · Missing Security Documentation —
Repository root. No SECURITY.md file or security vulnerability reporting guidelines are visible. This makes it difficult for security researchers to responsibly report vulnerabilities. Fix: Create a SECURITY.md file with vulnerability disclosure guidelines, contact information, and expected response timelines. Reference GitHub's security advisories.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.