RasaHQ/rasa

Item: RasaHQ/rasa
Rating: 5
Author: RepoPilot

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Healthy

Healthy across all four use cases

HealthyDependency

Permissive license, no critical CVEs, actively maintained — safe to depend on.

HealthyFork & modify

Has a license, tests, and CI — clean foundation to fork and modify.

HealthyLearn from

Documented and popular — useful reference codebase to read through.

HealthyDeploy as-is

No critical CVEs, sane security posture — runnable as-is.

⚠Slowing — last commit 3mo ago
✓Last commit 3mo ago
✓14 active contributors
✓Distributed ownership (top contributor 18% of recent commits)
✓Apache-2.0 licensed
✓CI configured
✓Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/rasahq/rasa)](https://repopilot.app/r/rasahq/rasa)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card

This card auto-renders when someone shares https://repopilot.app/r/rasahq/rasa on X, Slack, or LinkedIn.

Ask AI about RasaHQ/rasa

Grounded in the actual source code. Pick a starter question or write your own.

What does this repo do, in one paragraph?How would I get started using it?What are the main alternatives?Show me the entry point.

Or write your own question →

Onboarding doc

Onboarding: RasaHQ/rasa

Generated by RepoPilot · 2026-06-21 · Source

🎯Verdict

GO — Healthy across all four use cases

Last commit 3mo ago
14 active contributors
Distributed ownership (top contributor 18% of recent commits)
Apache-2.0 licensed
CI configured
Tests present
⚠ Slowing — last commit 3mo ago

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

⚡TL;DR

Rasa is an open-source machine learning framework for building task-oriented conversational AI agents that handle both NLU (natural language understanding) and dialogue management. It enables developers to create production-grade chatbots and voice assistants that connect to Slack, Facebook, and other platforms through intent classification, entity extraction, and dialogue state management. Monolithic Python package structure with core NLU/dialogue components, CLI via rasa command (inferred from .github/scripts/), Docker-based dev environments (.devcontainer/docker-compose.yml), and GitHub Actions workflows for multi-GPU model regression testing (.github/configs/tf-cuda.json, mr-test-example.yaml). Configuration-driven pipeline building via YAML files (.github/tests/test_data/*.yml), similar to Kedro.

👥Who it's for

AI/ML engineers and chatbot developers who need to build rule-driven conversational systems with custom NLU pipelines rather than pure LLM-based solutions. Organizations requiring strict business logic enforcement, custom entity handling, and multi-channel deployment (Slack, Teams, web).

🌱Maturity & risk

Rasa is in maintenance mode as of the README—the project is stable and production-used but no longer in active feature development. The codebase is mature (5.6M lines of Python), has extensive CI/CD via GitHub Actions (.github/workflows/), test coverage tracking (.coveragerc), and deep code quality monitoring (.codeclimate.yml, .deepsource.toml). However, the official direction has shifted toward Hello Rasa (CALM engine) as the future platform.

Risk: actively transitioning to maintenance mode with the team focusing on Hello Rasa/CALM as the successor, so this codebase will see fewer feature PRs. Heavy dependencies on TensorFlow and PyTorch (inferred from Dockerfile configs for tf-cuda.json) create version conflict potential. Monolithic codebase (5.6M LOC) means changes ripple widely. The shift in strategic direction could make long-term maintenance uncertain.

Active areas of work

The repo is in maintenance mode; active development has shifted to Hello Rasa and the CALM engine. The CI infrastructure is robust (model regression tests, GPU validation via .github/scripts/validate_gpus.py) suggesting ongoing validation, but feature commits are likely reduced. GitHub Actions workflows and DeepSource/CodeClimate integration indicate continuous monitoring of code quality.

🚀Get running

git clone https://github.com/RasaHQ/rasa.git
cd rasa
pip install poetry
poetry install
make train  # or inspect Makefile for available targets

Daily commands: Development: poetry run rasa train (trains NLU + core). Testing: pytest (inferred from .coveragerc). Server: rasa run for HTTP API or rasa run actions for custom action server. Pre-commit hooks likely defined in .github/. Docker: docker-compose up via .devcontainer/docker-compose.yml for isolated environment.

🗺️Map of the codebase

setup.py — Main entry point and package configuration; defines Rasa as an installable Python package with all core dependencies
rasa/__init__.py — Core package initialization; exposes primary APIs for NLU, dialogue, and model training that all contributors must understand
rasa/core/agent.py — Central Agent class orchestrating dialogue management, NLU, and action execution; critical for understanding the conversation flow
rasa/nlu/classifiers/diet_classifier.py — DIET (Dual Intent and Entity Transformer) classifier; core NLU component for intent and entity recognition
rasa/core/policies/policy.py — Abstract base class for dialogue policies; fundamental to understanding how Rasa selects next actions in conversations
pyproject.toml — Project metadata and dependency declarations; essential for understanding build, test, and runtime requirements
rasa/core/training/training.py — Core training pipeline orchestration for dialogue management; critical for understanding model training workflows

🛠️How to make changes

Add a New Dialogue Policy

Create a new policy class extending rasa.core.policies.policy.Policy (rasa/core/policies/your_policy.py)
Implement train() and predict() abstract methods for action selection (rasa/core/policies/your_policy.py)
Register the policy in the graph components registry (rasa/graph.py)
Add policy configuration example to documentation configs (data/configs_for_docs/example_policy_config.yml)

Add a New Channel/Connector

Create new connector class extending rasa.core.channels.channel.InputChannel (rasa/core/channels/your_channel.py)
Implement blueprint() method to define HTTP routes and message handling (rasa/core/channels/your_channel.py)
Add channel to channel registry and CLI argument parser (rasa/cli/main.py)
Write integration tests in test suite (tests/core/channels/test_your_channel.py)

Add a New NLU Component (Intent/Entity Classifier)

Create component class extending rasa.nlu.classifiers.classifier.IntentClassifier (rasa/nlu/classifiers/your_classifier.py)
Implement train() and process() methods for classification logic (rasa/nlu/classifiers/your_classifier.py)
Register component in the graph recipe system (rasa/graph.py)
Add component to suggested_config generation logic (rasa/core/config.py)

Add a Custom Action Handler

Create action class extending rasa.core.actions.action.Action (actions/actions.py)
Implement name() and run() methods; name() returns action ID from domain (actions/actions.py)
Register action in domain file under 'actions' section (domain.yml)
Reference action in story/dialogue flows (data/stories.yml)

🔧Why these technologies

Python + TensorFlow/Keras — Enables fast NLU/dialogue model prototyping with extensive ML ecosystem; transformers for intent/entity recognition (DIET, TED policies)
Graph-based pipeline composition (rasa/graph.py) — Allows flexible configuration of NLU components and policies without code changes; declarative YAML-to-pipeline mapping
SQLAlchemy + multiple tracker stores — Supports conversation history persistence across multiple backends (in-memory, SQL, Mongo) for scalable multi-turn dialogue
Flask/Sanic for REST server — Provides lightweight HTTP endpoint for model serving and webhook handling; low latency for real-time chat
YAML + Markdown training data formats — Human-readable, version-control-friendly formats for domain definition, stories, and NLU training examples

⚖️Trade-offs already made

Monolithic Agent class vs. microservice separation
- Why: Single-process Agent simplifies local development and testing; reduces operational complexity for single-bot deployments
- Consequence: Harder to scale NLU and dialogue independently; requires external queue/load balancer for multi-agent production setups
TensorFlow/Keras over PyTorch
- Why: Established ecosystem, SavedModel format, and Rasa's historical choice; mature production deployment tooling
- Consequence: Slightly less flexible dynamic computation; harder to integrate cutting-edge PyTorch research; heavier binary size
Custom policy selection loop vs. pre-computed action ranking
- Why: Policies can incorporate context-dependent logic and adjust scores at inference time
- Consequence: Per-turn policy inference adds latency (~100-300ms); harder to batch or cache predictions
In-process NLU + policy inference (no external model server)
- Why: Reduces deployment complexity and network latency; bot stays responsive even with slow network
- Consequence: Bot process memory and CPU usage scales with model size; harder to A/B test multiple models; no natural load balancing

🚫Non-goals (don't propose these)

Does not provide end-to-end conversation UI or frontend components; relies on channels (Slack, REST, etc.) for user interaction
Does not include speech-to-text or text-to-speech; focuses on text-based NLU and dialogue
Does not offer built-in user authentication or multi-tenant isolation at framework level
Does not provide real-time model retraining without server restart; models are loaded once at startup
Does not include a visual dialogue builder or low-code interface; configuration is code/YAML-first

🪤Traps & gotchas

Maintenance mode: PRs for new features may be rejected in favor of Hello Rasa; focus on bug fixes and maintenance. GPU requirements: Model training on .github/configs/tf-cuda.json suggests CUDA 11+ needed for full functionality; CPU-only is slow. Monolithic tests: Changes to core NLU require retraining models (expensive); .github/tests/test_data/*.yml are real regression test configs. Dependency hell: TensorFlow + PyTorch + spaCy version pinning is strict; poetry.lock must be respected. No explicit setup.py: project uses poetry (pyproject.toml) exclusively; pip install . won't work—use poetry install.

🏗️Architecture

💡Concepts to learn

Intent Classification + Entity Extraction — Core NLU task in Rasa; understanding intent/entity separation is fundamental to how dialogue policies route conversations
Story-Based Dialogue Policy (Supervised Learning for Dialogue) — Rasa's unique approach to dialogue management using training stories and learned policies instead of hand-coded rules; critical to understanding rasa/core/
Pluggable NLU Pipeline Architecture — Rasa's modular design (tokenizers → featurizers → classifiers) allows swapping components; key to understanding rasa/nlu/ extensibility
Tracker Store (Dialogue State Persistence) — Rasa persists conversation events in SQLite/PostgreSQL trackers; essential for multi-turn dialogue and understanding rasa/core/tracker_store.py
Custom Actions & Webhooks — Rasa executes business logic via external action servers (rasa/core/actions/); required for integrating with databases, APIs, and external services
YAML-Driven Configuration (Domain + Config) — Rasa separates intent/entity/slot definitions (domain.yml) from pipeline/policy config (config.yml); understanding this separation is key to project structure
Model Regression Testing (Conversation Quality Assurance) — The extensive .github/configs/mr-test-*.json setup enables regression detection on dialogue quality; critical for understanding CI expectations in this repo

botpress/botpress — Alternative visual chatbot builder with NLU and dialogue management; competes in same low-code conversational AI space
gunthercox/ChatterBot — Predecessor/inspiration for Rasa—simpler ML chatbot framework that lacks advanced NLU; Rasa evolved to address its limitations
RasaHQ/rasa-sdk — Official SDK for custom Rasa actions and integrations; required for extending Rasa with domain-specific business logic
RasaHQ/rasa-x — Commercial platform built on Rasa OSS for managing dialogue data, annotation, and model evaluation—ecosystem companion
explosion/spacy — Foundational NLP library used by Rasa for tokenization and entity recognition in its NLU pipeline

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive integration tests for GitHub Actions workflows

The repo has 10+ complex GitHub Actions workflows (.github/workflows/) but no integration tests validating their syntax, triggers, and job dependencies. Currently only .github/tests/ contains unit tests for Python scripts. Adding workflow validation tests would prevent silent failures in CI/CD and catch YAML syntax errors early.

[ ] Create .github/tests/test_workflows.py to validate all YAML files in .github/workflows/ against GitHub's workflow schema
[ ] Add tests to verify critical workflow triggers (continous-integration.yml, ci-model-regression.yml) have correct event conditions
[ ] Add tests to validate required secrets and environment variables are documented for workflows that depend on them (e.g., DataDog agent in model regression)

Add pre-commit hook tests for code quality tools

The repo has .pre-commit-config.yaml configured with multiple linters (flake8, deepsource integration), but .github/tests/ lacks tests validating that hooks actually run correctly. This ensures contributors can catch issues locally before pushing, reducing CI failures.

[ ] Create .github/tests/test_precommit_hooks.py to verify all hooks in .pre-commit-config.yaml are accessible and have valid configurations
[ ] Add test to ensure flake8 error matcher (.github/matchers/flake8-error-matcher.json) correctly parses actual violations
[ ] Add test to validate that deprecated or changed hook versions in dependabot.yml updates don't break the pre-commit pipeline

Add documentation for model regression test infrastructure with test examples

The repo contains sophisticated model regression testing (.github/templates/ with 5+ Jinja templates, .github/configs/ with test configs, and .github/tests/test_data/ with sample outputs) but lacks user-facing documentation explaining how to run/modify these tests locally. This creates a barrier for contributors wanting to validate model changes.

[ ] Create .github/docs/MODEL_REGRESSION_TESTING.md documenting how to run mr_generate_summary.py and mr_publish_results.py locally with example outputs from .github/tests/test_data/
[ ] Add comments to .github/templates/*.tmpl files explaining the Jinja variable context (from ci-model-regression.yml) and when each template is used
[ ] Add troubleshooting section in documentation referencing common issues like missing dataset branch detection (see test_model_regression_test_read_dataset_branch_tmpl.py)

🌿Good first issues

Add missing unit tests for error handling in .github/scripts/validate_gpus.py and validate_cpu.py—these validation scripts lack test coverage and are part of the critical CI path
Document the YAML configuration schema for NLU pipelines and story files with JSONSchema or Pydantic models in rasa/nlu/ and rasa/core/—currently the schema is implicit in code, making onboarding hard
Create a migration guide in docs/ for users moving from Rasa OSS to Hello Rasa/CALM, mapping old intent-based flows to the new flow-based architecture given the maintenance mode transition

⭐Top contributors

Click to expand

@sanchariGr — 18 commits
@ancalita — 17 commits
@m-vdb — 15 commits
@Tawakalt — 15 commits
@vcidst — 7 commits

📝Recent commits

Click to expand

60a3cff — Merge pull request #13155 from RasaHQ/m-vdb-patch-2 (m-vdb)
32ba527 — Revise README for community engagement and future direction (m-vdb)
f28c69e — feat: update references of bitnami image to binamilegacy (#13137) (diana-create)
d6829b6 — Merge pull request #13078 from RasaHQ/prepare-release-3.6.21 (radovanZRasa)
140f901 — Update version to 3.6.21 in test_read_long_resource_names_windows (radovanZRasa)
1d6f66c — Update minimum compatible version to 3.6.21 (radovanZRasa)
2d9c7aa — prepared release of version 3.6.21 (radovanZRasa)
2bb1d77 — Replace pickle with safer alternatives (#13067) (tabergma)
66296b2 — Fix 3.6.x build (#13068) (radovanZRasa)
d340440 — Merge pull request #13062 from RasaHQ/INFRA-1124-3.6.x (rasa-aadlv)

🔒Security observations

High · Unspecified Base Image with Potential Vulnerabilities — Dockerfile (lines 1-4). The Dockerfile uses ARG variables for base images (IMAGE_BASE_NAME, BASE_IMAGE_HASH, BASE_BUILDER_IMAGE_HASH) without specifying concrete image names or SHA256 digests. This allows for image substitution attacks and makes it difficult to track the exact base image being used. No image scanning or signed image verification is evident. Fix: Use explicit base image names with SHA256 digests instead of variable tags. Example: FROM myregistry/base-builder@sha256:abc123... Implement image signing and scanning in CI/CD pipeline.
High · Pip Install Without Version Pinning in Docker Build — Dockerfile (line 15). The Dockerfile upgrades pip and wheel without specifying exact versions beyond major version constraints (pip==22.*, wheel>0.38.0). This can lead to installing vulnerable or untested versions during builds. Fix: Pin exact versions: pip==22.3.1, wheel==0.38.4 (or latest patched versions). Regularly audit and update pinned versions.
Medium · Poetry Installation Without Dependency Audit — Dockerfile (line 16). The Dockerfile installs dependencies via 'poetry install --no-dev' without evidence of security auditing or dependency scanning. No security checks for known vulnerabilities in transitive dependencies. Fix: Add 'poetry check --strict' before installation. Integrate poetry audit or pip-audit in CI/CD. Scan dependencies for CVEs using tools like Safety or Trivy.
Medium · Build Artifacts Left in Intermediate Layers — Dockerfile (line 18). While the Dockerfile removes dist/*.whl and *.egg-info files, intermediate pip caches and build artifacts may remain in Docker layers, increasing image size and potential attack surface. Fix: Add --no-cache-dir to all pip commands and clean up pip cache: RUN pip install --no-cache-dir ... && rm -rf /root/.cache/pip
Medium · Environment Variable HOME Set Without Validation — Dockerfile (line 27). The HOME environment variable is set to /app without verification that this directory is properly secured. Combined with the subsequent chmod o+wr, this grants write access to 'others'. Fix: Verify /app ownership and permissions. Change 'chmod o+wr /app' to 'chmod o-w /app' unless world-writable is explicitly required. Use stricter permission model (e.g., g=rx instead of g=u).
Medium · World-Writable Directory in Production Container — Dockerfile (line 30). The command 'chmod o+wr /app' makes the /app directory world-writable, which could allow any user or process to modify application files, including potential privilege escalation. Fix: Change to 'chmod g=rx /app' to allow group read/execute only. If write access is needed, use a separate volume with restricted permissions.
Low · Missing VOLUME Mount Point Definition — Dockerfile (line 33). The Dockerfile ends with 'VOL' (incomplete line 33), which appears to be an incomplete VOLUME directive. This could indicate incomplete configuration or a build error. Fix: Complete the VOLUME directive properly, e.g., 'VOLUME ["/tmp"]' or remove if not needed. Ensure all Dockerfile syntax is valid.
Low · Potential Secrets in .env File — .env. A .env file exists in the repository root, which commonly contains sensitive credentials, API keys, or database passwords. If committed to version control, secrets may be exposed. Fix: Ensure .env is in .gitignore. Use environment-specific configuration management. Rotate any secrets that may have been exposed. Use secure secret management tools (e.g., HashiCorp Vault, AWS Secrets Manager).
Low · Missing Security Headers and SBOM Documentation — Repository root /. No evidence of Software Bill of Materials (SBOM), security headers, or container scanning policies in the visible configuration files. This reduces supply chain visibility. Fix: undefined

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/RasaHQ/rasa shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live RasaHQ/rasa repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/RasaHQ/rasa.

What it runs against: a local clone of RasaHQ/rasa — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in RasaHQ/rasa | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch 3.6.x exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 130 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>RasaHQ/rasa</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of RasaHQ/rasa. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/RasaHQ/rasa.git
#   cd rasa
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of RasaHQ/rasa and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "RasaHQ/rasa(\\.git)?\\b" \\
  && ok "origin remote is RasaHQ/rasa" \\
  || miss "origin remote is not RasaHQ/rasa (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify 3.6.x >/dev/null 2>&1 \\
  && ok "default branch 3.6.x exists" \\
  || miss "default branch 3.6.x no longer exists"

# 4. Critical files exist
test -f "setup.py" \\
  && ok "setup.py" \\
  || miss "missing critical file: setup.py"
test -f "rasa/__init__.py" \\
  && ok "rasa/__init__.py" \\
  || miss "missing critical file: rasa/__init__.py"
test -f "rasa/core/agent.py" \\
  && ok "rasa/core/agent.py" \\
  || miss "missing critical file: rasa/core/agent.py"
test -f "rasa/nlu/classifiers/diet_classifier.py" \\
  && ok "rasa/nlu/classifiers/diet_classifier.py" \\
  || miss "missing critical file: rasa/nlu/classifiers/diet_classifier.py"
test -f "rasa/core/policies/policy.py" \\
  && ok "rasa/core/policies/policy.py" \\
  || miss "missing critical file: rasa/core/policies/policy.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 130 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~100d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/RasaHQ/rasa"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Embed this chat in your README →

Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.

<iframe
  src="https://repopilot.app/embed/RasaHQ/rasa"
  width="100%" height="500"
  style="border:1px solid #d0d7de; border-radius:8px;"
  allow="microphone"
  loading="lazy"
></iframe>