RepoPilotOpen in app →

FlagOpen/FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Healthy

Healthy across all four use cases

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 2w ago
  • 5 active contributors
  • MIT licensed
Show all 6 evidence items →
  • CI configured
  • Concentrated ownership — top contributor handles 54% of recent commits
  • No test directory detected

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/flagopen/flagembedding)](https://repopilot.app/r/flagopen/flagembedding)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/flagopen/flagembedding on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: FlagOpen/FlagEmbedding

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/FlagOpen/FlagEmbedding shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

  • Last commit 2w ago
  • 5 active contributors
  • MIT licensed
  • CI configured
  • ⚠ Concentrated ownership — top contributor handles 54% of recent commits
  • ⚠ No test directory detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live FlagOpen/FlagEmbedding repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/FlagOpen/FlagEmbedding.

What it runs against: a local clone of FlagOpen/FlagEmbedding — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in FlagOpen/FlagEmbedding | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 45 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>FlagOpen/FlagEmbedding</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of FlagOpen/FlagEmbedding. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/FlagOpen/FlagEmbedding.git
#   cd FlagEmbedding
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of FlagOpen/FlagEmbedding and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "FlagOpen/FlagEmbedding(\\.git)?\\b" \\
  && ok "origin remote is FlagOpen/FlagEmbedding" \\
  || miss "origin remote is not FlagOpen/FlagEmbedding (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "FlagEmbedding/__init__.py" \\
  && ok "FlagEmbedding/__init__.py" \\
  || miss "missing critical file: FlagEmbedding/__init__.py"
test -f "FlagEmbedding/abc/inference/AbsEmbedder.py" \\
  && ok "FlagEmbedding/abc/inference/AbsEmbedder.py" \\
  || miss "missing critical file: FlagEmbedding/abc/inference/AbsEmbedder.py"
test -f "FlagEmbedding/abc/inference/AbsReranker.py" \\
  && ok "FlagEmbedding/abc/inference/AbsReranker.py" \\
  || miss "missing critical file: FlagEmbedding/abc/inference/AbsReranker.py"
test -f "FlagEmbedding/abc/finetune/embedder/AbsRunner.py" \\
  && ok "FlagEmbedding/abc/finetune/embedder/AbsRunner.py" \\
  || miss "missing critical file: FlagEmbedding/abc/finetune/embedder/AbsRunner.py"
test -f "FlagEmbedding/evaluation/beir/runner.py" \\
  && ok "FlagEmbedding/evaluation/beir/runner.py" \\
  || miss "missing critical file: FlagEmbedding/evaluation/beir/runner.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 45 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~15d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/FlagOpen/FlagEmbedding"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

FlagEmbedding is BAAI's production-grade toolkit for dense vector embeddings and retrieval-augmented generation (RAG), featuring the BGE (BAAI General Embedding) model series. It provides unified abstractions for embedding inference, reranking, finetuning on custom datasets, and comprehensive evaluation across 15+ benchmarks, enabling search, semantic similarity, and RAG pipelines without boilerplate. Monorepo organized into three main layers: (1) FlagEmbedding/abc/ defines abstract base classes (AbsEmbedder, AbsReranker, AbsDataset, AbsRunner) for pluggable implementations; (2) FlagEmbedding/evaluation/ contains air_bench benchmark suite with long-doc and qa examples; (3) research/ likely contains specific model implementations. Configuration-driven via HuggingFace transformers abstractions.

👥Who it's for

ML engineers and researchers building search systems and RAG applications who need pretrained multilingual embedding models (BAAI/bge-large-en-v1.5, etc.) with minimal integration overhead, plus practitioners finetuning embeddings on proprietary corpora via the modular finetune framework in FlagEmbedding/abc/finetune/.

🌱Maturity & risk

Actively maintained production system: v1.3.0 released, released multimodal BGE-VL (March 2025) and OmniGen, comprehensive CI/CD via .github/workflows/, extensive evaluation suite (air_bench with MTEB and C-MTEB benchmarks). Code is ~3.9MB Python across 50+ modules with structured abstractions, indicating maturity and ongoing heavy development.

Moderate risk: monolithic codebase with tight coupling between abc/ abstractions and implementations (embedder/reranker/evaluation), limited visible test coverage in file listing, single organization (BAAI) maintains core. Dependencies not fully listed but likely heavy (transformers, torch, huggingface_hub). Breaking changes between v1.x releases possible given active development pace.

Active areas of work

Active development on multimodal embeddings (BGE-VL released 3/6/2025), expanded benchmark coverage (air_bench with long-doc examples up to 700k tokens), documentation migration to bge-model.com (12/5/2024), community growth (WeChat group launched 10/29/2024). Recent commits clearly focused on multimodal and long-context retrieval.

🚀Get running

git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding
pip install -e .
python -c 'from FlagEmbedding import FlagModel; model = FlagModel("BAAI/bge-large-en-v1.5"); print(model.encode(["hello world"]))'

Daily commands: No traditional dev server; this is a library. Run evaluation via: python -m FlagEmbedding.evaluation.air_bench --model_name_or_path BAAI/bge-large-en-v1.5 --benchmark_name long-doc. Finetune embeddings via runner classes in FlagEmbedding/abc/finetune/embedder/AbsRunner.py (subclass and override).

🗺️Map of the codebase

  • FlagEmbedding/__init__.py — Package entry point that exposes the main FlagEmbedding API for embedding and retrieval-augmented LLM functionality.
  • FlagEmbedding/abc/inference/AbsEmbedder.py — Abstract base class defining the core embedder interface that all embedding implementations must follow; essential for understanding extensibility.
  • FlagEmbedding/abc/inference/AbsReranker.py — Abstract base class for reranker implementations; critical for understanding how ranking models integrate into the pipeline.
  • FlagEmbedding/abc/finetune/embedder/AbsRunner.py — Abstract runner for fine-tuning embedders; coordinates the training workflow and must be understood before adding new fine-tuning capabilities.
  • FlagEmbedding/evaluation/beir/runner.py — Main evaluation runner for BEIR benchmark; demonstrates the evaluation framework pattern used across all evaluation modules.
  • FlagEmbedding/abc/evaluation/evaluator.py — Abstract evaluator class that defines the evaluation interface; necessary for implementing new evaluation benchmarks.
  • FlagEmbedding/abc/finetune/reranker/AbsDataset.py — Abstract dataset class for reranker fine-tuning; shows the data pipeline pattern used throughout the codebase.

🛠️How to make changes

Add a New Evaluation Benchmark

  1. Create a new benchmark module directory following the pattern of existing benchmarks (e.g., FlagEmbedding/evaluation/newbench/) (FlagEmbedding/evaluation/)
  2. Implement a data loader inheriting from abc.evaluation.data_loader pattern (FlagEmbedding/abc/evaluation/data_loader.py)
  3. Create a runner.py script that handles benchmark-specific evaluation logic, using arguments.py for configuration (FlagEmbedding/evaluation/beir/runner.py)
  4. Add a main.py entry point to make the benchmark executable as a module (FlagEmbedding/evaluation/beir/__main__.py)
  5. Store benchmark example data in FlagEmbedding/evaluation/newbench/examples/ as JSONL files (FlagEmbedding/evaluation/beir/data_loader.py)

Create a New Embedder Implementation

  1. Inherit from FlagEmbedding.abc.inference.AbsEmbedder and implement the encode() method (FlagEmbedding/abc/inference/AbsEmbedder.py)
  2. Create a fine-tuning dataset class inheriting from abc.finetune.embedder.AbsDataset (FlagEmbedding/abc/finetune/embedder/AbsDataset.py)
  3. Implement AbsArguments subclass for your embedder's training hyperparameters (FlagEmbedding/abc/finetune/embedder/AbsArguments.py)
  4. Create a runner class inheriting from abc.finetune.embedder.AbsRunner to orchestrate training (FlagEmbedding/abc/finetune/embedder/AbsRunner.py)
  5. Add your embedder implementation to the main FlagEmbedding package exports (FlagEmbedding/__init__.py)

Add a Custom Reranker Model

  1. Create a class inheriting from FlagEmbedding.abc.inference.AbsReranker and implement the rank() or rerank() method (FlagEmbedding/abc/inference/AbsReranker.py)
  2. Define a dataset class for reranker training data by inheriting from abc.finetune.reranker.AbsDataset (FlagEmbedding/abc/finetune/reranker/AbsDataset.py)
  3. Create training arguments by subclassing abc.finetune.reranker.AbsArguments (FlagEmbedding/abc/finetune/reranker/AbsArguments.py)
  4. Implement a reranker runner inheriting from abc.finetune.reranker.AbsRunner to manage the training loop (FlagEmbedding/abc/finetune/reranker/AbsRunner.py)

Extend Evaluation with Custom Metrics

  1. Reference the metric computation utilities in existing benchmarks for the metric calculation pattern (FlagEmbedding/evaluation/mkqa/utils/compute_metrics.py)
  2. Create a custom evaluator by extending abc.evaluation.evaluator.Evaluator with your metric logic (FlagEmbedding/abc/evaluation/evaluator.py)
  3. Add your metric computations to the benchmark's runner.py in the evaluation step (FlagEmbedding/evaluation/custom/runner.py)

🔧Why these technologies

  • Python + HuggingFace Transformers — Natural choice for embedding and LLM models; enables seamless integration with pre-trained transformer models and community ecosystem.

🪤Traps & gotchas

Model loading: FlagEmbedding uses HuggingFace model_id strings (e.g., 'BAAI/bge-large-en-v1.5'); offline mode requires HF_HOME set and models pre-cached. Finetuning: AbsRunner subclasses require implementing _setup_model(), train_step(), and eval_step(); missing any breaks the loop silently. Evaluation: air_bench examples are long-doc (100k+ tokens); standard short-document benchmarks may have different evaluation logic in searcher.py (dense vs. sparse retrieval). Dependencies: Code assumes transformers>=4.x and torch installed; no fallback to CPU-only inference in abstractions.

🏗️Architecture

💡Concepts to learn

  • Dense Passage Retrieval (DPR) — FlagEmbedding's core paradigm: encoding queries and documents as dense vectors for semantic similarity-based ranking, replacing sparse lexical matching; understanding DPR is essential to using this library effectively
  • In-batch Negatives (Contrastive Learning) — BGE models train using contrastive loss with hard negatives sampled within batch; finetuning requires understanding how AbsDataset provides triplet (query, positive, negative) samples and how loss scales with batch size
  • Normalized Temperature-scaled Cross Entropy (NT-Xent) — Loss function used in abc/finetune/ for embedding training; crucial to tuning temperature hyperparameter and understanding why embedding norms matter
  • Token Limit Truncation and Pooling Strategies — BGE handles variable-length inputs (up to 8192 tokens in some variants); AbsEmbedder implements mean pooling vs. CLS token selection; critical for production retrieval where document length varies
  • Cross-Encoder Re-ranking — FlagEmbedding separates two-tower embedding (fast retrieval) from cross-encoder reranking (accurate relevance); AbsReranker computes pairwise query-document scores; understanding when to apply reranking is key
  • Multi-Task Embedding Benchmark (MTEB) — Standard benchmark suite (C-MTEB for Chinese, air_bench extends it) used to evaluate models; understanding benchmark design (retrieval, clustering, STS tasks) shapes what models optimize for
  • Curriculum Learning for Hard Negatives — BGE training uses progressive hard negative mining; important when finetuning on custom data to avoid convergence to local minima with weak negatives
  • huggingface/transformers — Direct dependency; FlagEmbedding wraps transformers.AutoModel and inherits PreTrainedModel abstractions
  • embeddings-benchmark/mteb — MTEB evaluation framework that air_bench integrates; defines standard retrieval benchmarks FlagEmbedding models are evaluated against
  • FlagOpen/FlagLLM — Sibling BAAI project for LLM finetuning; shares similar modular trainer patterns and uses BGE embeddings for retrieval augmentation
  • langchain-ai/langchain — Ecosystem consumer; LangChain integrates BGE embeddings as standard embedding provider for RAG chains
  • VectorSpaceLab/MegaPairs — Synthetic dataset repo released with BGE-VL; provides training data and methodology for multimodal embedding finetuning

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for embedding and reranker inference modules

The repo has abstract base classes for embedders and rerankers (AbsEmbedder.py, AbsReranker.py) but no visible test coverage in the file structure. Given that this is a retrieval toolkit with multiple evaluation benchmarks (BEIR, AIR_BENCH, BRIGHT), adding comprehensive integration tests would ensure inference implementations work correctly across different model variants and prevent regressions.

  • [ ] Create FlagEmbedding/tests/inference/ directory with test_embedder.py and test_reranker.py
  • [ ] Add tests that instantiate concrete embedder/reranker implementations and verify embedding/ranking outputs match expected shapes and value ranges
  • [ ] Add tests for the evaluation runners (FlagEmbedding/evaluation/beir/runner.py, FlagEmbedding/evaluation/air_bench/runner.py) to ensure they correctly load data and compute metrics
  • [ ] Add GitHub Actions workflow (.github/workflows/tests.yml) to run pytest on Python 3.8+ with coverage reports

Add unit tests for data loading and preprocessing utilities

The evaluation modules have data loaders (FlagEmbedding/evaluation/beir/data_loader.py, FlagEmbedding/abc/evaluation/data_loader.py) and utility functions (FlagEmbedding/abc/evaluation/utils.py) but no visible test coverage. These are critical paths for evaluation pipelines and should have explicit tests to catch data format/encoding issues early.

  • [ ] Create FlagEmbedding/tests/evaluation/ directory with test_data_loader.py and test_utils.py
  • [ ] Add tests for BEIR data loader with sample datasets or fixtures (test loading queries, corpus, qrels formatting)
  • [ ] Add tests for evaluation utilities covering edge cases (empty inputs, malformed JSON in JSONL, special characters)
  • [ ] Verify that evaluation arguments (FlagEmbedding/abc/evaluation/arguments.py, FlagEmbedding/evaluation/air_bench/arguments.py) are correctly parsed with integration tests

Add CI workflow to validate documentation builds and examples

The repo has sphinx documentation infrastructure (sphinx, myst-nb, pydata-sphinx-theme in dependencies) with a documentation.yml workflow stub, but there are no apparent documentation source files or validation that example notebooks/scripts work end-to-end. This is critical for RAG/retrieval projects where users rely heavily on examples.

  • [ ] Create docs/source/ directory with conf.py and index.rst (or index.md) if not present, documenting the main modules: inference, evaluation, and finetuning
  • [ ] Add example Jupyter notebooks in docs/examples/ for common workflows (basic embedding, reranking, running BEIR evaluation)
  • [ ] Update .github/workflows/documentation.yml to run sphinx-build validation and optionally execute example notebooks with nbval or papermill
  • [ ] Add notebook execution tests to ensure code examples in FlagEmbedding/evaluation/*/examples/ directories stay current with API changes

🌿Good first issues

  • Add unit tests for FlagEmbedding/abc/evaluation/searcher.py (BM25 and dense search implementations) — currently no visible test files in listing; would validate ranking correctness.
  • Implement missing docstrings and type hints in FlagEmbedding/abc/finetune/embedder/AbsModeling.py — critical file with minimal inline docs, blocks contributors from extending model architectures.
  • Create a minimal e2e example script in examples/ showing: (1) load pretrained BGE, (2) finetune on custom triplets, (3) evaluate on custom benchmark — currently no worked examples visible in file structure.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 7ed43d6 — Merge pull request #1575 from hanhainebula/master (hanhainebula)
  • a4247c6 — chore: update version to 1.4.0 in setup.py (hanhainebula)
  • 8092277 — Merge pull request #1572 from lnxtree/master (hanhainebula)
  • d82edd1 — fix: add DEFAULT_POOLING_METHOD to PseudoMoELLMEmbedder (lnxtree)
  • 0d27243 — test: Add examples of test scripts for pseudoMoE and coir pseudo-moe test entry (lnxtree)
  • ab7c252 — feat(embedder): add decoder-only pseudo_moe inference with domain routing support (lnxtree)
  • d190647 — Merge pull request #1570 from lnxtree/master (hanhainebula)
  • ac3e239 — feat: add use_mrl interface of finetune for embedder decode_only.icl and encoder_only.base (lnxtree)
  • 29d380b — feat: add truncate_dim for evaluation and inference for embedder (lnxtree)
  • b09c377 — feat: add the 'use_mrl' interface to the finetune of embedder (lnxtree)

🔒Security observations

The FlagEmbedding codebase shows a reasonable security posture as an open-source machine learning project, but has several improvement areas. Primary concerns include unpinned dependencies (allowing vulnerable package versions), lack of visible input validation in data loaders, missing security policy documentation, and inability to verify CI/CD security without seeing workflow contents. The project appears well-structured with no obvious hardcoded secrets or critical misconfigurations visible in the provided file structure. Recommended actions: (1) Pin all dependencies, (2) Implement input validation across data processing, (3) Create SECURITY.md, (4) Audit CI/CD workflows, and (5) Implement automated dependency scanning.

  • Medium · Incomplete or Pinned Dependency Versions — dependencies/Package file. The provided dependencies file lists packages (sphinx, myst-nb, myst_parser, sphinx-design, pydata-sphinx-theme) without pinned versions. This could allow installation of vulnerable versions of these packages during build time, especially since Sphinx and related documentation tools have had historical vulnerabilities. Fix: Pin all dependency versions to known secure releases using exact version specifiers (e.g., 'sphinx==7.2.6' instead of 'sphinx'). Implement automated vulnerability scanning for dependencies using tools like Safety, Dependabot, or Snyk.
  • Low · No Security Policy Visible — Repository root. While the codebase is licensed under MIT and appears open-source, there is no visible SECURITY.md or security reporting policy documented. This could make it difficult for security researchers to responsibly report vulnerabilities. Fix: Create a SECURITY.md file with instructions for reporting security vulnerabilities responsibly, following GitHub's recommended security policy format.
  • Low · Potential Data Loading Without Validation — FlagEmbedding/evaluation/*/data_loader.py files. Multiple data loader modules (data_loader.py files in evaluation subdirectories) are present but cannot be fully assessed from the file structure alone. These are common vectors for injection attacks if they process untrusted input (JSONL, CSV data) without proper validation. Fix: Implement strict input validation for all data loaders. Use schema validation libraries, sanitize all external inputs, and avoid using pickle for untrusted data. Add unit tests for malformed input handling.
  • Low · Configuration Files Not Visible in Exclusions — .gitignore. The .gitignore file is present but its content is not provided. Unable to verify if sensitive files (.env, credentials, API keys) are properly excluded from version control. Fix: Ensure .gitignore includes common sensitive file patterns: *.env, .key, .pem, secrets., config.local., .credentials, etc. Run 'git log --all --full-history -- <filename>' to check if secrets were ever committed.
  • Low · Workflow Files Lack Visibility — .github/workflows/documentation.yml. The GitHub Actions workflow file documentation.yml is present but contents are not provided. CI/CD pipelines can introduce security risks if misconfigured (e.g., exposed secrets, insecure artifact handling, permission escalation). Fix: Audit workflow files for: (1) Use of GITHUB_TOKEN with minimal permissions, (2) Avoiding secrets in logs/artifacts, (3) Pinning action versions, (4) Implementing branch protection rules, (5) Reviewing third-party action security.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · FlagOpen/FlagEmbedding — RepoPilot