RepoPilotOpen in app →

marcotcr/lime

Lime: Explaining the predictions of any machine learning classifier

Healthy

Healthy across all four use cases

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • 19 active contributors
  • BSD-2-Clause licensed
  • CI configured
  • Tests present
  • Stale — last commit 2y ago
  • Concentrated ownership — top contributor handles 51% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the “Healthy” badge

Paste into your README — live-updates from the latest cached analysis.

RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/marcotcr/lime)](https://repopilot.app/r/marcotcr/lime)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/marcotcr/lime on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: marcotcr/lime

Generated by RepoPilot · 2026-05-06 · Source

Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/marcotcr/lime shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

Verdict

GO — Healthy across all four use cases

  • 19 active contributors
  • BSD-2-Clause licensed
  • CI configured
  • Tests present
  • ⚠ Stale — last commit 2y ago
  • ⚠ Concentrated ownership — top contributor handles 51% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live marcotcr/lime repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/marcotcr/lime.

What it runs against: a local clone of marcotcr/lime — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in marcotcr/lime | Confirms the artifact applies here, not a fork | | 2 | License is still BSD-2-Clause | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 680 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>marcotcr/lime</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of marcotcr/lime. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/marcotcr/lime.git
#   cd lime
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of marcotcr/lime and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "marcotcr/lime(\\.git)?\\b" \\
  && ok "origin remote is marcotcr/lime" \\
  || miss "origin remote is not marcotcr/lime (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(BSD-2-Clause)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"BSD-2-Clause\"" package.json 2>/dev/null) \\
  && ok "license is BSD-2-Clause" \\
  || miss "license drift — was BSD-2-Clause at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "lime/lime_base.py" \\
  && ok "lime/lime_base.py" \\
  || miss "missing critical file: lime/lime_base.py"
test -f "lime/lime_text.py" \\
  && ok "lime/lime_text.py" \\
  || miss "missing critical file: lime/lime_text.py"
test -f "lime/lime_tabular.py" \\
  && ok "lime/lime_tabular.py" \\
  || miss "missing critical file: lime/lime_tabular.py"
test -f "lime/lime_image.py" \\
  && ok "lime/lime_image.py" \\
  || miss "missing critical file: lime/lime_image.py"
test -f "lime/explanation.py" \\
  && ok "lime/explanation.py" \\
  || miss "missing critical file: lime/explanation.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 680 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~650d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/marcotcr/lime"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

LIME (Local Interpretable Model-Agnostic Explanations) is a framework that explains individual predictions from any black-box machine learning classifier by fitting interpretable local linear models around specific instances. It supports text classifiers, tabular data (numpy arrays), and image classifiers, generating interactive HTML visualizations that show which features (words, pixel regions, or columns) contributed positively or negatively to a prediction. Monolithic package structure: lime/ directory contains the core Python explanation engine (lime/explanation.py, lime/discretize.py) with bundled JavaScript visualization components (lime/bundle.js, lime/js/*.js for D3 charting). Benchmarks/ has performance tests for text and tabular data. Doc/notebooks/ contains 12+ Jupyter tutorials covering text, images, regression, and neural networks. No monorepo; single cohesive package.

Who it's for

Data scientists and ML engineers who need to explain model predictions to stakeholders, auditors, or end-users; researchers building interpretability tools; practitioners using scikit-learn, Keras, PyTorch, or custom classifiers who need non-technical explanations of why their models made specific decisions.

Maturity & risk

Production-ready and actively maintained. The project has a published academic paper (https://arxiv.org/abs/1602.04938), Travis CI integration (.travis.yml), Jupyter notebooks with examples, and PyPI distribution. However, Python 2 support was dropped at v0.2.0 (now on v1.0.0), indicating deliberate maintenance and evolution rather than abandonment.

Low maintenance risk: single maintainer (marcotcr) but well-established with academic backing. Dependencies are stable (d3.js ^3.5.17, lodash ^4.11.2, Babel tooling) but the JavaScript build stack (webpack 1.x, babel 6.x) is outdated; no lock file visible (no package-lock.json mentioned) creates potential reproducibility issues. No visible test suite in file listing is a concern for a widely-used explanation library.

Active areas of work

Unable to determine from file listing alone (no recent commit hashes, PR list, or issue tracker snapshot provided). The bundled bundle.js and bundle.js.map suggest active webpack builds. Package.json shows v1.0.0, indicating a recent major version bump.

Get running

git clone https://github.com/marcotcr/lime.git
cd lime
pip install -e .
# For JavaScript visualization development:
cd lime && npm install && npm run build

Daily commands:

# Run Jupyter tutorials (docs the actual usage):
jupyter notebook doc/notebooks/

# Build JavaScript changes:
cd lime && npm run watch
# or single build:
npm run build

# Run benchmarks:
python benchmark/text_perf.py
python benchmark/table_perf.py

Map of the codebase

  • lime/lime_base.py — Abstract base class for LIME explainers; defines the core explain_instance interface that all model-specific explainers inherit from.
  • lime/lime_text.py — Text-specific explainer implementation; handles feature extraction and perturbation for NLP classifiers.
  • lime/lime_tabular.py — Tabular-specific explainer for structured data; implements discretization and feature handling for ML on tables.
  • lime/lime_image.py — Image-specific explainer; manages superpixel segmentation and image perturbation strategies.
  • lime/explanation.py — Explanation object returned by explainers; serializes and visualizes LIME results as HTML or JSON.
  • lime/discretize.py — Feature discretization logic for continuous variables; critical for creating interpretable feature boundaries.
  • setup.py — Package metadata and dependencies; defines Python version requirements and core library imports.

How to make changes

Add support for a new classifier type (e.g., graph neural networks)

  1. Create a new file lime/lime_graph.py that inherits from lime_base.LimeBase (lime/lime_graph.py)
  2. Implement init() to initialize your feature perturbation strategy and explainer kernel (lime/lime_graph.py)
  3. Override explain_instance(instance, predict_fn, num_samples) to handle graph-specific feature masking (lime/lime_graph.py)
  4. Add test cases in lime/tests/test_lime_graph.py following the pattern of test_lime_text.py (lime/tests/test_lime_graph.py)
  5. Create a Jupyter notebook in doc/notebooks/ demonstrating the new explainer on a real model (doc/notebooks/Tutorial - Graph Neural Networks.ipynb)

Add a new feature discretization strategy

  1. Open lime/discretize.py and review existing discretizers (QuartileDiscretizer, KBinsDiscretizer, etc.) (lime/discretize.py)
  2. Add a new class inheriting from Discretizer with custom boundaries_() method (lime/discretize.py)
  3. Update lime/lime_tabular.py to accept the new discretizer as an option in init() (lime/lime_tabular.py)
  4. Add unit tests in lime/tests/test_discretize.py to verify edge cases (lime/tests/test_discretize.py)

Enhance explanation visualization with a new chart type

  1. Create a new JavaScript file lime/js/custom_chart.js following the structure of bar_chart.js (lime/js/custom_chart.js)
  2. Add a new render method in lime/js/explanation.js to conditionally display your chart (lime/js/explanation.js)
  3. Update lime/style.css with CSS classes for styling your new chart (lime/style.css)
  4. Rebuild the bundle by running npm run build in the lime/ directory (lime/webpack.config.js)

Why these technologies

  • Python (core) — Primary language for ML practitioners; allows seamless integration with scikit-learn, TensorFlow, PyTorch, and other ML ecosystems.
  • numpy/scipy — Efficient numerical computations for perturbation sampling, linear model fitting (ridge regression), and matrix operations.
  • scikit-learn — Ridge regression for fitting interpretable local models; provides stable numerical implementations.
  • JavaScript/D3.js (visualization) — Browser-based interactive visualizations of feature contributions; enables rich, client-side exploration of explanations.
  • Webpack + Babel — Bundles ES6 JavaScript into production-ready assets; enables modern JS development with transpilation for broad browser support.

Trade-offs already made

  • Model-agnostic approach (black-box prediction API only)

    • Why: Maximizes applicability across any classifier type without introspecting internal weights or gradients.
    • Consequence: Requires many function evaluations (num_samples), making it computationally expensive for high-latency models; cannot leverage model-specific structure for efficiency.
  • Local linear surrogate model (ridge regression)

    • Why: Simple, interpretable, and fast; avoids overfitting on small local neighborhoods.
    • Consequence: May not capture non-linear decision boundaries accurately; feature importance is only valid in local neighborhood around instance.
  • Separate explainers for text, tabular, image (not unified)

    • Why: Each data modality has distinct perturbation and feature representation strategies; specialized implementations are cleaner.
    • Consequence: Code duplication in core explain loop; new modalities require full new explainer class.
  • HTML visualization bundled with explanations

    • Why: Enables self-contained, shareable explanations without external dependencies.
    • Consequence: Bundle size bloat; frontend updates require rebuilds; complex JavaScript maintenance.

Non-goals (don't propose these)

  • Real-time streaming explanations (batch-oriented design)
  • GPU acceleration for perturbation (CPU-based only)
  • Distributed/parallel perturbation sampling across multiple machines
  • Explaining time-series or graph models natively (would need lime_graph.py extension)
  • Adversarial robustness analysis (explanations may be vulnerable to perturbations)
  • Causal inference (LIME is correlational, not causal)

Traps & gotchas

  1. No visible test suite: File listing shows no tests/ or test_*.py files. Contributing without test coverage may fail CI (Travis configured but test framework not shown). 2. JavaScript build required: Python package includes pre-built bundle.js, but modifying lime/js/ requires npm install && npm run build before changes take effect—easy to forget. 3. D3.js v3 hardlock: Package requires d3@^3.5.17 (from 2014), incompatible with v4+ API; any upgrade breaks visualizations. 4. Discretization coupling: Feature discretization tightly coupled to explainer type (text vs tabular vs image); no plugin architecture. 5. Python 2 EOL: v0.2.0+ dropped Python 2 support; old notebooks using print statements will fail if running against newer codebase.

Architecture

Concepts to learn

  • Local Linear Approximation — LIME's core algorithm: fits a simple interpretable linear model to perturbed data points near the instance being explained. Understanding how local approximation differs from global model fitting is essential to why LIME works.
  • Model-Agnostic Explanations — LIME treats any classifier as a black box, requiring only prediction probabilities. This design principle allows it to work with scikit-learn, Keras, PyTorch, H2O without classifier-specific code.
  • Feature Perturbation & Sampling — LIME perturbs features (removes words, masks image regions, varies column values) to measure their impact. lime/discretize.py implements this; understanding perturbation strategies is critical for modifying behavior.
  • LIME with Sparse Explanations (Submodular Pick) — LIME can select the most 'interesting' instances from a dataset for explanation using submodular optimization. doc/notebooks/Submodular Pick examples.ipynb covers this; useful for understanding representative instances.
  • Discretization Strategies — lime/discretize.py uses quartile binning, entropy-based binning, and text tokenization to convert continuous/raw features into interpretable bins. Different strategies suit different feature types; understanding this is key to extending LIME.
  • D3.js Force-Directed Visualization — lime/js/bar_chart.js uses D3.js v3 to render interactive feature importance as color-coded bar charts. The visualization is as important as the algorithm; changes here directly affect how users understand explanations.
  • Weighted Linear Regression — LIME fits a weighted linear model where weights are inversely proportional to distance from the original instance. This is sklearn's Ridge/LogisticRegression under the hood; understanding regularization and weight initialization is crucial.

Related repos

  • shap/shap — SHAP (SHapley Additive exPlanations) is a more theoretically rigorous alternative using game theory; competes directly with LIME for feature importance explanations across classifiers.
  • pytorch/captum — PyTorch's interpretability library with attribution methods (integrated gradients, saliency maps) for deep learning; overlaps with LIME's image explanation capability but deeper neural network integration.
  • ELI5-org/eli5 — Higher-level wrapper around LIME and scikit-learn with sklearn-specific optimizations and permutation importance; simplifies LIME usage for tabular data but less flexible.
  • interpretml/interpret — Microsoft's Interpret package combining LIME, SHAP, PDP, ICE, and more in a unified UI; broader toolkit but heavier; LIME is their foundation.
  • christophM/interpretable-ml-book — Comprehensive open textbook on interpretability (chapters 5.4+ cover LIME in detail); companion learning resource, not code, but essential context for understanding design decisions.

PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add unit tests for lime/lime_image.py and lime/lime_base.py

The test suite in lime/tests/ is missing test coverage for image explanation functionality. While test_lime_text.py and test_lime_tabular.py exist, there are no corresponding tests for lime_image.py. Additionally, lime_base.py (the core base class) lacks dedicated unit tests. This is critical for a library focused on model interpretability, where correctness is paramount.

  • [ ] Create lime/tests/test_lime_image.py with tests for image segmentation, masking, and explanation generation using sample images from doc/notebooks/data/
  • [ ] Create lime/tests/test_lime_base.py to test core LimeBase methods and the explanation pipeline
  • [ ] Add fixtures for mock image classifiers and test data to lime/tests/init.py
  • [ ] Ensure tests run in .travis.yml CI pipeline

Add integration tests for JavaScript visualization bundle in lime/

The repo contains a complete webpack-based JavaScript build system (package.json, webpack.config.js, lime/js/ modules) for generating HTML explanations, but there are no tests validating that the bundle builds successfully or that the visualization renders correctly. The test suite focuses only on Python. This creates risk that changes could break the interactive explanation UI without detection.

  • [ ] Add a npm test script in package.json that runs eslint (already configured) and webpack build validation
  • [ ] Create a simple test in lime/tests/ that validates the generated bundle.js exists and can be imported by the explanation.py module
  • [ ] Add a GitHub Actions workflow (.github/workflows/js-build.yml) that runs npm install and npm test on each PR
  • [ ] Document the JS testing setup in CONTRIBUTING.md

Add missing unit tests for lime/discretize.py and lime/utils/generic_utils.py

The test suite has test_discretize.py and test_generic_utils.py listed in lime/tests/, but given the critical role these modules play in feature discretization and utility functions used across all three explainers (text, tabular, image), the test files appear minimal or incomplete. These are foundational utilities that deserve comprehensive coverage to prevent regressions.

  • [ ] Expand lime/tests/test_discretize.py with comprehensive tests for QuartileDiscretizer, DecileDiscretizer, and EntropyDiscretizer edge cases (empty data, single value, categorical vs continuous)
  • [ ] Expand lime/tests/test_generic_utils.py with tests for all utility functions, including error handling and boundary conditions
  • [ ] Add integration tests validating discretization behavior when used with each of the three explainers (LimeText, LimeTabular, LimeImage)
  • [ ] Run coverage report to ensure >85% coverage for both files

Good first issues

  • Add unit test suite for lime/discretize.py covering edge cases: empty arrays, single-value features, mixed categorical/continuous, NaN handling. Currently no tests visible in file list despite this being mission-critical for feature binning.: Medium: Discretization bugs silently corrupt explanations; tests would catch regressions and boost contributor confidence.
  • Create missing docstrings and type hints for lime/explanation.py public API (LimeExplainer, fit, predict_local). Current notebooks are the only 'documentation'; IDE autocomplete and sphinx docs are broken.: Low: Users struggle to understand parameter meanings; sphinx can auto-generate docs from docstrings (doc/conf.py suggests sphinx is configured but doc/lime.rst is likely bare).
  • Upgrade Webpack 1.x → 4.x+ and Babel 6.x → 7.x+ in package.json. Modernize lime/js/*.js to ES2020. This unblocks D3.js v4+ support and reduces npm audit warnings.: High: Current stack is 7+ years old; build times slow, security patches lagged, new contributors expect modern tooling.

Top contributors

Recent commits

  • fd7eb2e — Merge pull request #624 from Aaryanverma/master (marcotcr)
  • fd4df2d — Update explanation.py (Aaryanverma)
  • 449cddb — Update explanation.py (Aaryanverma)
  • bad4400 — Added pyplot figure size functionality (Aaryanverma)
  • 097bde8 — Update explanation.py (Aaryanverma)
  • fc801ef — Merge pull request #612 from tmielika/lime_tabular (marcotcr)
  • b60e647 — fix import of lime.lime_tabular (tmielika)
  • f13b93e — Merge pull request #611 from eresearchqut/master (marcotcr)
  • 03a315c — Merge pull request #564 from jackred/patch-1 (marcotcr)
  • 475881b — Merge pull request #560 from ytfksw/master (marcotcr)

Security observations

  • High · Outdated and Vulnerable Dependencies — package.json - devDependencies and dependencies. The package.json contains multiple dependencies with known vulnerabilities. Babel (^6.8.0, ^6.17.0), webpack (^1.13.0), and webpack-dev-server (^1.14.1) are from 2016-2017 and have numerous documented CVEs. Babel-polyfill, babel-preset-es2015, and lodash (^4.11.2) from 2016 also contain security issues. These versions are significantly outdated. Fix: Update all dependencies to latest stable versions: babel-cli/core to v7+, webpack to v5+, webpack-dev-server to v4+, lodash to latest (^4.17.21+), and d3 to v7+. Run 'npm audit' and address all reported vulnerabilities.
  • High · Missing License Declaration — package.json - license field. The package.json specifies 'license': 'TODO', indicating no proper license has been declared. This creates legal ambiguity and potential compliance issues for users of this library. Fix: Declare an appropriate open-source license (e.g., 'BSD-2-Clause', 'MIT', 'Apache-2.0'). Update the package.json license field and ensure LICENSE file matches the declared license.
  • Medium · Insecure ESLint Configuration — package.json - eslintConfig and devDependencies. ESLint is pinned to version ^6.6.0 (from 2019), which is outdated. Additionally, the eslintConfig in package.json uses 'eslint:recommended' without additional security-focused rules. No custom security rules are configured (e.g., no checks for hardcoded secrets, unsafe DOM manipulation, or prototype pollution). Fix: Upgrade ESLint to v8+. Add security-focused eslint plugins like 'eslint-plugin-security' and configure rules to detect potential XSS, injection attacks, and unsafe patterns in JavaScript code.
  • Medium · No Security Headers or Content Security Policy — lime/js/ - frontend assets. The JavaScript frontend code (lime/js/ and lime/bundle.js) may be served without proper security headers. No evidence of CSP (Content Security Policy), X-Frame-Options, or other protective headers in the codebase. Fix: Implement Content Security Policy headers, X-Frame-Options, X-Content-Type-Options, and other security headers in the web server configuration or middleware serving the LIME visualizations.
  • Medium · Webpack Development Server Exposed — package.json - scripts section. The package.json includes webpack-dev-server with hot-reload enabled ('start': 'webpack-dev-server --hot --inline'). If this runs in production or is exposed, it can be a security risk as dev servers are not hardened for production use. Fix: Ensure webpack-dev-server is only used in development environments. In production, use a proper application server with security hardening. Add environment checks to prevent dev-server execution in production.
  • Low · Missing .gitignore Best Practices — .gitignore. While .gitignore exists, there's no visibility into its contents from the provided file structure. Common secrets (API keys, credentials) could be accidentally committed if .gitignore doesn't properly exclude node_modules, .env files, or build artifacts. Fix: Ensure .gitignore includes: node_modules/, dist/, build/, *.env, .env.local, .DS_Store, and other sensitive/generated files. Consider using a pre-commit hook with tools like 'husky' and 'lint-staged' to prevent secrets from being committed.
  • Low · No Security Policy or Vulnerability Disclosure — Repository root. There is no SECURITY.md or vulnerability disclosure policy visible in the repository, making it difficult for security researchers to report issues responsibly. Fix: Create a SECURITY.md file documenting how to responsibly report security vulnerabilities. Include contact information and expected response times.
  • Low · CI/CD Configuration Missing Security Checks — undefined. .travis.yml exists but its content is not provided. If it doesn't include dependency scanning (npm audit), SAST, or linting, security Fix: undefined

LLM-derived; treat as a starting point, not a security audit.

Where to read next


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · marcotcr/lime — RepoPilot