ymcui/Chinese-LLaMA-Alpaca

Item: ymcui/Chinese-LLaMA-Alpaca
Rating: 5
Author: RepoPilot

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Healthy

Healthy across all four use cases

HealthyDependency

Permissive license, no critical CVEs, actively maintained — safe to depend on.

HealthyFork & modify

Has a license, tests, and CI — clean foundation to fork and modify.

HealthyLearn from

Documented and popular — useful reference codebase to read through.

HealthyDeploy as-is

No critical CVEs, sane security posture — runnable as-is.

⚠Concentrated ownership — top contributor handles 63% of recent commits
⚠No test directory detected
✓Last commit 3w ago
✓6 active contributors
✓Apache-2.0 licensed
✓CI configured

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/ymcui/chinese-llama-alpaca)](https://repopilot.app/r/ymcui/chinese-llama-alpaca)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card

This card auto-renders when someone shares https://repopilot.app/r/ymcui/chinese-llama-alpaca on X, Slack, or LinkedIn.

Ask AI about ymcui/chinese-llama-alpaca

Grounded in the actual source code. Pick a starter question or write your own.

What does this repo do, in one paragraph?How would I get started using it?What are the main alternatives?Show me the entry point.

Or write your own question →

Onboarding doc

Onboarding: ymcui/Chinese-LLaMA-Alpaca

Generated by RepoPilot · 2026-06-20 · Source

🎯Verdict

GO — Healthy across all four use cases

Last commit 3w ago
6 active contributors
Apache-2.0 licensed
CI configured
⚠ Concentrated ownership — top contributor handles 63% of recent commits
⚠ No test directory detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

⚡TL;DR

Chinese-LLaMA-Alpaca is an open-source project that extends Meta's LLaMA model with expanded Chinese vocabulary and pretrains it on Chinese data, then instruction-tunes it as 'Alpaca' for chat. It provides 7B, 13B, and 33B model variants (basic, Plus, Pro editions) optimized for efficient local CPU/GPU deployment via quantization, supporting ecosystems like transformers, llama.cpp, and LangChain. Monorepo structure: /data contains training datasets (alpaca_data_zh_51k.json, pt_sample_data.txt), /examples showcase model outputs organized by task (QA.md, CODE.md, DIALOGUE.md etc.) with quantization levels (f16-p7b-p13b-33b, q4_7b-13b, q8_13b-p7b-p13b), scripts directory (inferred) contains training/deployment code, with documentation at repo root (README.md, README_EN.md) and wiki at GitHub.

👥Who it's for

NLP researchers and developers in the Chinese AI community who want to fine-tune or deploy Chinese large language models locally without cloud infrastructure. Also targets practitioners building Chinese chatbots or text generation applications who need open-source alternatives to proprietary APIs.

🌱Maturity & risk

Actively maintained with significant community adoption (indicated by multi-version releases: v1, v2, and v3 branches mentioned in README). The project has infrastructure (GitHub Actions CI in .github/workflows/, SHA256.md checksums, proper CITATION.cff), but specific star count and test coverage are not visible in provided data. Verdict: production-ready for inference; actively developed with regular model releases.

Moderate risk: depends on rapid-moving ecosystem (transformers==4.30.0, peft from git commit 13e53fc rather than stable release) which may break compatibility. Single maintainer (ymcui) pattern visible in org structure. No visible test suite in file listing suggests validation is primarily empirical. Large model sizes (33B parameters) create deployment friction despite quantization support.

Active areas of work

Chinese-LLaMA-Alpaca-3 recently launched (mentioned as project startup in README). The v2 technical report focuses on 'Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca' (arxiv 2304.08177), indicating active work on vocabulary efficiency and encoding optimization. Quantization-level variants (q4, q8, f16) are being actively released.

🚀Get running

git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git
cd Chinese-LLaMA-Alpaca
pip install torch==1.13.1 transformers==4.30.0 sentencepiece==0.1.97
pip install git+https://github.com/huggingface/peft.git@13e53fc

Then explore examples/ or data/ directories for model cards and training data.

Daily commands: Inference example (inferred from ecosystem support): python -c "from transformers import AutoModelForCausalLM, AutoTokenizer; model = AutoModelForCausalLM.from_pretrained('path/to/model'); tokenizer = AutoTokenizer.from_pretrained('path/to/model'); ..." Training scripts likely in scripts/ (not shown in file list). See examples/f16-p7b-p13b-33b/, examples/q4_7b-13b/, examples/q8_7b-13b-p7b/ for task-specific prompts and expected outputs.

🗺️Map of the codebase

scripts/training/run_clm_sft_with_peft.py — Core supervised fine-tuning training script using PEFT LoRA; essential for understanding the main training pipeline
scripts/training/run_clm_pt_with_peft.py — Pre-training script with PEFT integration; foundational for model pre-training workflows
scripts/merge_llama_with_chinese_lora.py — Merges base LLaMA weights with Chinese LoRA adapters; critical for model deployment and inference
scripts/inference/inference_hf.py — HuggingFace-based inference entry point; required for understanding model inference and deployment patterns
scripts/inference/gradio_demo.py — Interactive web UI for inference; demonstrates end-user interaction patterns and model usage
requirements.txt — Defines exact dependency versions (transformers, PEFT, torch); critical for reproducibility and environment setup
data/alpaca_data_zh_51k.json — Core Chinese instruction-following dataset; required for SFT and understanding training data format

🛠️How to make changes

Add a Custom Fine-Tuning Script

Copy scripts/training/run_clm_sft_with_peft.py as template (scripts/training/run_clm_sft_with_peft.py)
Modify dataset loading in build_dataset.py to support your custom format (scripts/training/build_dataset.py)
Create shell script wrapper in scripts/training/ (e.g., run_custom_sft.sh) with your hyperparameters (scripts/training/run_sft.sh)
Test by running shell script with small dataset subset first (scripts/training/run_custom_sft.sh)

Deploy Model as REST API

Start with scripts/openai_server_demo/openai_api_server.py as base (scripts/openai_server_demo/openai_api_server.py)
Merge LoRA adapters using scripts/merge_llama_with_chinese_lora.py to get full weights (scripts/merge_llama_with_chinese_lora.py)
Modify openai_api_server.py to load your merged model path and inference parameters (scripts/openai_server_demo/openai_api_server.py)
Run server and test endpoints against OpenAI-compatible client (scripts/openai_server_demo/README.md)

Evaluate Model on C-Eval Benchmark

Check subject mapping and benchmark structure in subject_mapping.json (scripts/ceval/subject_mapping.json)
Prepare model using inference_hf.py or merge adapters first (scripts/inference/inference_hf.py)
Run evaluation using scripts/ceval/eval.py with model path and output directory (scripts/ceval/eval.py)
Review results and compare against baseline in examples/ (examples/README.md)

Create Interactive Demo with Gradio

Review example Gradio demo implementation (scripts/inference/gradio_demo.py)
Prepare merged model or ensure LoRA checkpoints are available (scripts/merge_llama_with_chinese_lora.py)
Customize gradio_demo.py with your model path, system prompt, and UI parameters (scripts/inference/gradio_demo.py)
Run demo and access via public URL or localhost (scripts/inference/gradio_demo.py)

🔧Why these technologies

PEFT (Parameter-Efficient Fine-Tuning) — Enables training on consumer GPUs by reducing trainable parameters to 1-5% via LoRA, making Chinese model adaptation accessible
HuggingFace Transformers — Provides standardized model loading, tokenization, and generation APIs; integrates seamlessly with PEFT and inference tools
torch 1.13.1 — Mature version with stable CUDA support; balances compatibility with PEFT and recent transformer optimizations
SentencePiece Tokenizer — Handles Chinese character segmentation better than standard BPE; merging allows dual-language vocabulary without retraining
Gradio Web UI — Low-code framework for deploying interactive chat interfaces without frontend expertise; shareable public links
DeepSpeed ZeRO-2 — Enables training larger models (13B/33B) on multi-GPU setups via optimizer state partitioning

⚖️Trade-offs already made

LoRA adapter training instead of full fine-tuning
- Why: Dramatically reduces memory and compute requirements (1-2 GPU days vs weeks for full fine-tune)
- Consequence: LoRA adapters must be merged with base model for inference; cannot be easily mixed/swapped at runtime
Separate pre-
- Why: undefined
- Consequence: undefined

🪤Traps & gotchas

PEFT version constraint: depends on specific git commit (13e53fc) not a stable release; pip install may fail if that commit is removed from GitHub or network is unavailable — requires git to be installed. 2. torch==1.13.1 is old: released Sept 2022; may have CUDA compatibility issues with newer GPUs (A100, H100) — users on latest hardware may need to override. 3. Model weight downloads not in repo: GitHub clone gets code only; actual model weights (7B-33B) must be downloaded separately from HuggingFace Model Hub or other sources not detailed here — can be 15-130GB depending on size and precision. 4. Chinese-only training data: alpaca_data_zh_51k.json is Chinese instruction pairs; fine-tuning on English data may degrade Chinese capability. 5. No visible requirements.txt: dependencies listed in README snippet but not in a requirements.txt file, making reproducibility harder for CI/CD.

🏗️Architecture

💡Concepts to learn

Vocabulary expansion for LLMs — This project's core innovation: extending LLaMA's 32k BPE tokens with ~20k Chinese tokens improves Chinese encoding efficiency from ~3-4 tokens per Chinese character down to ~1-2, directly reducing inference latency and token limits
Instruction tuning (Alpaca-style) — Transforms a base language model into a chat assistant by fine-tuning on instruction-response pairs; the 51k examples in alpaca_data_zh_51k.json encode this paradigm, and understanding the prompt format (###Instruction/###Input/###Response) is essential for replicating results
Post-training quantization (PTQ) — The q4_* and q8_* variants in examples/ represent 4-bit and 8-bit quantization reducing model size from ~26GB (f16, 7B) to ~3.5GB (q4) without retraining, making inference viable on laptops; critical for understanding latency/accuracy tradeoffs
Parameter-Efficient Fine-Tuning (LoRA) — PEFT dependency enables training adapters (small weight updates) instead of full model weights, reducing VRAM from 80GB to ~16GB for 7B model tuning; understanding LoRA is essential for custom instruction-tuning without enterprise hardware
Continued pretraining (domain adaptation) — The base Chinese-LLaMA models use pt_sample_data.txt for continued pretraining on Chinese text before instruction-tuning; this two-stage pipeline (pretrain → finetune) is different from training from scratch and essential for reproducing the models
Byte-Pair Encoding (BPE) + SentencePiece — sentencepiece==0.1.97 dependency tokenizes input text; this project modifies the BPE vocabulary to prioritize Chinese characters, diverging from standard LLaMA tokenization—understanding this is critical when swapping models or debugging encoding issues
Hugging Face Hub model format — Models are distributed via Hugging Face Model Hub (referenced in README); understanding how to load from Hub (AutoModelForCausalLM, AutoTokenizer) and the safetensors format is essential for both inference and fine-tuning workflows

ymcui/Chinese-LLaMA-Alpaca-2 — Direct successor using LLaMA-2 instead of LLaMA-1; same project owner, choose between v2 (more stable) or v3 (latest) based on your requirements
ggerganov/llama.cpp — Inference engine explicitly supported by this project for local CPU/GPU quantized model deployment; llama.cpp is the key tool for running q4/q8 variants locally
huggingface/transformers — Core dependency (==4.30.0) for model loading, tokenization, and inference; this project extends transformers' LLaMA implementation with Chinese vocabulary
microsoft/PEFT — Parameter-efficient fine-tuning library (used as git dependency) enabling LoRA-based instruction tuning with minimal VRAM on consumer hardware
airaria/Visual-Chinese-LLaMA-Alpaca — Multimodal extension of this project adding vision capabilities; referenced in README as related work for users needing image+text

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add automated CI workflow for notebook validation and conversion

The notebooks/ directory contains 5+ Jupyter notebooks for critical workflows (LoRA fine-tuning, quantization, inference) but there's no CI validation. Notebooks can become outdated or broken without detection. A GitHub Action should validate notebook syntax, test cell execution (at least imports/setup cells), and ensure they match the current dependency versions (torch==1.13.1, transformers==4.30.0). This prevents contributor notebooks from drifting out of sync.

[ ] Create .github/workflows/notebook-validation.yml to run nbval or papermill on notebooks/ directory
[ ] Add validation for Python imports and dependency compatibility in notebook cells
[ ] Document the notebook testing process in notebooks/README.md
[ ] Test against the pinned versions in requirements (torch==1.13.1, transformers==4.30.0, etc.)

Add integration tests for data preprocessing and format validation

The repo includes sample data files (data/alpaca_data_zh_51k.json, data/pt_sample_data.txt) but lacks tests validating the data format expected by training scripts. Contributors adding new datasets or modifying data loading logic have no way to verify correctness. Add a test suite that validates JSON schema, encoding (UTF-8 for Chinese), and record counts.

[ ] Create tests/test_data_format.py to validate alpaca_data_zh_51k.json structure (required fields, data types)
[ ] Add UTF-8 encoding validation for Chinese text in both JSON and TXT data files
[ ] Create a data validation utility in a new src/data/validator.py module
[ ] Document data format requirements in data/README.md with schema examples

Create example configuration templates and validation for training scripts

The examples/ directory has 16 markdown files showing outputs but no actual training configuration files or templates. Users and contributors must infer training parameters from examples alone. Add YAML/JSON config templates for each model variant (7b, 13b, 33b with q4/q8/f16 quantization levels) that can be validated against a schema, making reproducibility and contribution easier.

[ ] Create examples/configs/ directory with YAML templates: 7b-lora.yaml, 13b-lora.yaml, 33b-lora.yaml, etc.
[ ] Create src/config/schema.py to define and validate training configuration using Pydantic or similar
[ ] Add a --validate-config utility script that contributors can use to verify their configs before training
[ ] Update examples/README.md with references to config templates and validation instructions

🌿Good first issues

Add unit tests for tokenizer expansion**: The project has no visible test/ directory despite critical vocabulary modification logic. Create pytest tests validating that Chinese tokens are properly added to vocab (test file: tests/test_tokenizer.py) and that encoding/decoding preserves Chinese characters for 7B/13B/33B variants.: medium: Prevents silent tokenization regressions when updating SentencePiece configs
Document quantization format differences in examples/: Create a comparison table in examples/README.md showing f16 vs q8 vs q4 tradeoffs (latency, VRAM, BLEU/ROUGE scores on data/alpaca_data_zh_51k.json held-out test set) for each model size: low-medium: Users currently must infer quality/speed tradeoffs by trial; benchmarks would reduce setup time
Add GitHub Actions workflow to validate model outputs: Create .github/workflows/model-validation.yml that runs inference on a tiny subset of alpaca_data_zh_51k.json (first 5 examples) for each quantization variant on CPU-only runner, checking that no NaN outputs occur: medium: Catch model corruption in downloads or quantization bugs before users spend hours training on bad weights
Provide Docker image with all dependencies pinned: Add Dockerfile at repo root with torch==1.13.1, transformers==4.30.0, PEFT commit pinned, and llama.cpp pre-built; document in README.md: low: Eliminates 'works on my machine' issues around CUDA versions and system dependencies that plague PyTorch projects

⭐Top contributors

Click to expand

@ymcui — 63 commits
@airaria — 18 commits
@iMountTai — 8 commits
@sunyuhan19981208 — 7 commits
@GoGoJoestar — 3 commits

📝Recent commits

Click to expand

5b8bb55 — Update README.md (ymcui)
090475f — remove google drive link (use hf instead) (ymcui)
f213c2c — update news on Chinese-LLaMA-Alpaca-3 (ymcui)
1f96c4d — add modelscope links (ymcui)
9a1376b — chinese-llama-alpaca-3 launched (ymcui)
602d43a — add link to sota platform (ymcui)
75b3642 — update news on Chinese-Mixtral (ymcui)
0c920db — update news (ymcui)
0cd8165 — Update stale.yml (ymcui)
6e8c6c2 — add model overview graph (ymcui)

🔒Security observations

The Chinese-LLaMA-Alpaca codebase has moderate security concerns primarily related to outdated dependencies and incomplete version pinning. The most critical issues are: (1) Using PyTorch 1.13.1 with known vulnerabilities instead of current 2.x versions, (2) Git-based PEFT dependency without stable release pinning, and (3) Outdated transformers library. The project lacks a comprehensive dependency lock file strategy, increasing supply chain risk. For a research/ML project with inference capabilities, there's limited evidence of security hardening for web interfaces. Immediate action should focus on updating core dependencies to versions with active security support.

High · Outdated PyTorch Version with Known Vulnerabilities — requirements.txt - torch==1.13.1. The dependency file specifies torch==1.13.1, which is significantly outdated (released in early 2023). This version contains known security vulnerabilities and may have unpatched CVEs. Current stable versions are 2.x series. Fix: Update to the latest stable PyTorch version (2.0+) that receives security patches. Review PyTorch security advisories and update to torch>=2.1.0
High · Unstable Git Dependency Without Version Pinning — requirements.txt - git+https://github.com/huggingface/peft.git@13e53fc. The PEFT library is installed from a git commit hash (13e53fc) rather than a stable release. This approach is fragile and makes reproducibility difficult. The specific commit may not receive security updates if the repository is updated. Fix: Pin to a specific stable release version instead: peft>=0.7.0 (or appropriate latest version). If custom patches are needed, maintain a fork with clear documentation of changes.
Medium · Outdated Transformers Library — requirements.txt - transformers==4.30.0. transformers==4.30.0 is outdated (released mid-2023). Newer versions include security patches and important bug fixes. The current version may be vulnerable to model loading exploits or other security issues fixed in later releases. Fix: Update to transformers>=4.35.0 or latest stable release. Review the changelog for security-related fixes between 4.30.0 and current version.
Medium · No Dependency Pinning Strategy — requirements.txt - overall structure. The requirements.txt lacks complete version pinning for transitive dependencies. This creates supply chain risks where sub-dependencies could be compromised or introduce breaking changes. Fix: Generate and maintain a requirements-lock.txt or use pip-tools/poetry to lock all transitive dependencies. Include hash verification for critical packages.
Low · Missing Security Headers Documentation — scripts/inference/gradio_demo.py. No evidence of security configuration documentation for the web inference components (gradio_demo.py). Web-based interfaces require proper security headers and CORS policies. Fix: Document security configurations for web interfaces including CORS policies, authentication requirements, and rate limiting. Implement proper security headers in Gradio configuration.
Low · Data Files Not Validated for Integrity — SHA256.md and data/ directory. The SHA256.md file exists but there's no clear evidence in the file structure that data files are validated before use. This could allow tampering with training/evaluation data. Fix: Implement hash verification for data files during loading. Document the verification process and ensure checksums are validated in data loading scripts.
Low · No License File Present in Some Directories — scripts/ and notebooks/ directories. While LICENSE.md exists at root, individual scripts and notebooks may lack clear licensing headers, potentially causing compliance issues. Fix: Add SPDX license headers to all source files. Ensure LICENSE file is referenced in key directories.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/ymcui/Chinese-LLaMA-Alpaca shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live ymcui/Chinese-LLaMA-Alpaca repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/ymcui/Chinese-LLaMA-Alpaca.

What it runs against: a local clone of ymcui/Chinese-LLaMA-Alpaca — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in ymcui/Chinese-LLaMA-Alpaca | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 50 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>ymcui/Chinese-LLaMA-Alpaca</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of ymcui/Chinese-LLaMA-Alpaca. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git
#   cd Chinese-LLaMA-Alpaca
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of ymcui/Chinese-LLaMA-Alpaca and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "ymcui/Chinese-LLaMA-Alpaca(\\.git)?\\b" \\
  && ok "origin remote is ymcui/Chinese-LLaMA-Alpaca" \\
  || miss "origin remote is not ymcui/Chinese-LLaMA-Alpaca (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "scripts/training/run_clm_sft_with_peft.py" \\
  && ok "scripts/training/run_clm_sft_with_peft.py" \\
  || miss "missing critical file: scripts/training/run_clm_sft_with_peft.py"
test -f "scripts/training/run_clm_pt_with_peft.py" \\
  && ok "scripts/training/run_clm_pt_with_peft.py" \\
  || miss "missing critical file: scripts/training/run_clm_pt_with_peft.py"
test -f "scripts/merge_llama_with_chinese_lora.py" \\
  && ok "scripts/merge_llama_with_chinese_lora.py" \\
  || miss "missing critical file: scripts/merge_llama_with_chinese_lora.py"
test -f "scripts/inference/inference_hf.py" \\
  && ok "scripts/inference/inference_hf.py" \\
  || miss "missing critical file: scripts/inference/inference_hf.py"
test -f "scripts/inference/gradio_demo.py" \\
  && ok "scripts/inference/gradio_demo.py" \\
  || miss "missing critical file: scripts/inference/gradio_demo.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 50 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~20d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/ymcui/Chinese-LLaMA-Alpaca"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Similar Python repos

Other healthy-signal Python repos by stars.

Embed this chat in your README →

Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.

<iframe
  src="https://repopilot.app/embed/ymcui/chinese-llama-alpaca"
  width="100%" height="500"
  style="border:1px solid #d0d7de; border-radius:8px;"
  allow="microphone"
  loading="lazy"
></iframe>