keephq/keep

Item: keephq/keep
Rating: 5
Author: RepoPilot

The open-source AIOps and alert management platform

Healthy

Healthy across the board

weakest axis

Use as dependencyConcerns

non-standard license (Other)

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 4d ago
✓39+ active contributors
✓Distributed ownership (top contributor 9% of recent commits)

Show all 7 evidence items →

✓Other licensed
✓CI configured
✓Tests present
⚠Non-standard license (Other) — review terms

What would change the summary?

→Use as dependency Concerns → Mixed if: clarify license terms

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/keephq/keep)](https://repopilot.app/r/keephq/keep)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/keephq/keep on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: keephq/keep

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/keephq/keep shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 4d ago
39+ active contributors
Distributed ownership (top contributor 9% of recent commits)
Other licensed
CI configured
Tests present
⚠ Non-standard license (Other) — review terms

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live keephq/keep repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/keephq/keep.

What it runs against: a local clone of keephq/keep — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in keephq/keep | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 34 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>keephq/keep</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of keephq/keep. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/keephq/keep.git
#   cd keep
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of keephq/keep and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "keephq/keep(\\.git)?\\b" \\
  && ok "origin remote is keephq/keep" \\
  || miss "origin remote is not keephq/keep (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "README.md" \\
  && ok "README.md" \\
  || miss "missing critical file: README.md"
test -f "docker-compose.yml" \\
  && ok "docker-compose.yml" \\
  || miss "missing critical file: docker-compose.yml"
test -f "docker/Dockerfile.api" \\
  && ok "docker/Dockerfile.api" \\
  || miss "missing critical file: docker/Dockerfile.api"
test -f "docker/Dockerfile.ui" \\
  && ok "docker/Dockerfile.ui" \\
  || miss "missing critical file: docker/Dockerfile.ui"
test -f ".github/workflows/test-pr-ut.yml" \\
  && ok ".github/workflows/test-pr-ut.yml" \\
  || miss "missing critical file: .github/workflows/test-pr-ut.yml"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 34 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~4d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/keephq/keep"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Keep is an open-source AIOps and alert management platform that acts as a single pane of glass for monitoring alerts across your infrastructure. It provides alert deduplication, correlation, enrichment, filtering, bi-directional integrations with monitoring tools (Prometheus, Datadog, PagerDuty, etc.), and AI-powered incident context gathering. The platform enables teams to reduce alert noise and automate incident response through customizable workflows. Monorepo structure with Python backend (likely in root with Flask/FastAPI server files) and TypeScript/React frontend (keep-ui/ directory). Backend exposes provider integrations (alert ingestion, bi-directional syncs), alert deduplication/correlation logic, and workflow engine. Frontend is React/TypeScript UI consuming the backend API. Multiple Docker configurations (docker/Dockerfile.api, docker/Dockerfile.ui, .dev variants) support both containerized and local development.

👥Who it's for

DevOps engineers, SREs, and on-call teams who manage multiple monitoring tools and need to centralize, deduplicate, and correlate alerts across their stack. Platform engineers building internal alert management systems. Organizations adopting AIOps practices to reduce MTTR (Mean Time To Resolution).

🌱Maturity & risk

Actively developed with regular commits and organized GitHub workflows for CI/CD, testing, and releases (see .github/workflows/). The codebase shows significant scale (4.8M Python LOC, 2.7M TypeScript LOC) with comprehensive Docker setup and multiple deployment configurations. Production-ready for alert management, though as an open-source project it requires self-hosting.

Large polyglot codebase (Python + TypeScript + JavaScript) means onboarding overhead and potential maintenance burden across multiple technology stacks. Dependency surface area is substantial given the integration-heavy nature (supporting 20+ providers means many external API dependencies). The monolithic structure (no clear microservices separation visible) could become scaling bottleneck, but active commit history suggests the team is monitoring and addressing issues.

Active areas of work

Active development across multiple fronts: auto-release and release workflow automation, E2E test infrastructure (test-pr-e2e.yml), provider integrations testing (test-pr-integrations.yml), and UI testing (test-pr-ut-ui.yml). Recent focus includes alert evaluation documentation (docs/alertevaluation/examples/) with concrete monitoring system examples (VictoriaMetrics). Developer onboarding workflow automation suggests scaling the team.

🚀Get running

Clone the repo: git clone https://github.com/keephq/keep.git && cd keep. Check for .python-version and package.json to determine runtime requirements. Use docker-compose for full stack: docker-compose -f docker-compose.dev.yml up (see docker-compose.dev.yml). For local development, install Python dependencies and TypeScript/Node dependencies separately, then run backend and frontend servers independently. See CONTRIBUTING.md for detailed onboarding.

Daily commands: Dev environment: docker-compose -f docker-compose.dev.yml up launches full stack. For backend-only: run Python server from root (likely python -m app or flask run after pip install). For frontend: cd keep-ui && npm install && npm run dev. Production: docker-compose up -f docker-compose.yml. Auth-enabled variant: docker-compose -f docker-compose-with-auth.yml up. Check .env files and docker-compose.common.yml for required environment variables.

🗺️Map of the codebase

README.md — Entry point documenting Keep's core value proposition as an AIOps platform with alert management, deduplication, enrichment, and workflows.
docker-compose.yml — Primary orchestration file defining the deployment architecture and service dependencies for the entire platform.
docker/Dockerfile.api — Backend API container definition; essential for understanding the production runtime environment and dependencies.
docker/Dockerfile.ui — Frontend UI container definition; required for understanding the React/TypeScript build and runtime setup.
.github/workflows/test-pr-ut.yml — Primary CI/CD pipeline for unit tests; defines how code quality is validated before merge.
CONTRIBUTING.md — Contributor guidelines establishing development standards, PR expectations, and onboarding procedures.
.cursor/rules/keep-ui-react-typescript.mdc — Cursor AI rules for React/TypeScript conventions; reflects the team's established coding patterns and style.

🛠️How to make changes

Add a new Provider Integration

Create a new provider directory following Keep's provider structure (most providers are in a separate providers package not fully visible in file list) (docs/cli/commands/provider-connect.mdx)
Implement the provider class with authentication and webhook handling methods (docs/applications/github.mdx)
Register the provider in the available integrations list (docs/cli/commands/provider-list.mdx)
Add integration tests in the test suite (.github/workflows/test-pr-integrations.yml)

Add a new Alert Rule or Enrichment Step

Define the enrichment logic in the alert evaluation engine (docs/alertevaluation/overview.mdx)
Add example configurations for common monitoring backends (docs/alertevaluation/examples/victoriametricssingle.mdx)
Expose the new rule via CLI or API endpoint (docs/cli/commands/alert-enrich.mdx)
Update the frontend alert table to display enriched data (docs/alerts/table.mdx)

Add a new Workflow or Automation

Define the workflow YAML schema following Keep's workflow DSL (docs/cli/commands/workflow-apply.mdx)
Implement the workflow execution logic in the backend (docs/cli/commands/cli-workflow.mdx)
Add the workflow template to the UI dashboard for user discovery (.cursor/rules/keep-ui-react-typescript.mdc)
Document the workflow with examples and configuration options (docs/cli/overview.mdx)

Add a new UI Component or Dashboard View

Follow React/TypeScript patterns and component structure (.cursor/rules/keep-ui-react-typescript.mdc)
Implement tests using the established UI testing framework (.cursor/rules/keep-ui-tests.mdc)
Create the component in the UI layer and connect to backend API (docs/alerts/sidebar.mdx)
Add the component to the navigation and routing if needed (docker/Dockerfile.ui)

🔧Why these technologies

Docker + docker-compose — Standardizes dev, test, and prod environments; enables multi-service orchestration (API, UI, cache, DB) with reproducible deployments across local and cloud platforms.
React + TypeScript (Frontend) — Type-safe UI development with real-time alert dashboard updates; TypeScript catches integration errors early; React enables reactive alert state changes and preset filtering.
Python Backend (Flask/FastAPI implied) — Rapid alert processing and enrichment logic; rich ecosystem for monitoring integrations; supports async task queues (arq noted in compose) for background alert correlation.
Multi-auth support (Okta, Auth0, Keycloak, OAuth2, DB) — Enterprise flexibility; no single auth vendor lock-in; enables Keep deployment in strict corporate environments with existing identity systems.
Bi-directional webhooks/integrations — Enables both alert ingestion from monitoring tools and alert dispatch to incident management; reduces need for polling and achieves near-real-time alert flow.

⚖️Trade-offs already made

Single pane of glass for alert management vs. staying agnostic to monitoring backend
- Why: Users need a unified interface to deduplicate and correlate alerts from multiple sources; a single aggregation point is essential for AIOps correlation.
- Consequence: Requires connectors for each monitoring/alerting system; adds integration maintenance burden; creates single point of failure risk if Keep itself goes down.
CLI + API + UI for alert operations (three interfaces)
- Why: Supports different user personas: ops engineers (CLI/automation), SREs (API), and incident responders (UI dashboard).
- Consequence: Higher development and testing burden; must keep all three in sync; risk of feature inconsistency across interfaces.
Alert deduplication + enrichment + correlation in backend vs. at ingestion point
- Why: Centralized dedup ensures consistency; allows rich correlation across time and multiple sources; supports backfilling historical alerts.
- Consequence: Backend becomes compute-intensive; alert latency increases; requires efficient caching and indexing for high-volume environments.

🪤Traps & gotchas

Environment variables: backend likely requires DATABASE_URL, API keys for provider integrations (OpenAI, Anthropic, Datadog, etc.), and OTEL configuration if observability is enabled. Services: requires PostgreSQL or similar DB (docker-compose sets this up), Redis/ARQ for async tasks (docker-compose-with-arq.yml). Frontend build: TypeScript strict mode likely enforced (common in mature projects). Monorepo: changes in backend may require frontend rebuild due to API changes—ensure both are in sync. Pre-commit hooks: .pre-commit-config.yaml present, so commits may be blocked if linting fails locally.

🏗️Architecture

💡Concepts to learn

Alert Deduplication and Correlation — Core feature of Keep that reduces alert fatigue by grouping related alerts from multiple sources; critical to understand fingerprinting, grouping keys, and time-window based correlation
Provider Pattern (Pluggable Integrations) — Keep's extensibility is built on a provider abstraction layer allowing new monitoring tools to be integrated; understanding this pattern is essential for adding new integrations
Bi-directional Sync / Event-driven Architecture — Alerts flow into Keep from providers AND Keep can push actions back (close incident, update status); requires understanding webhooks, polling, eventual consistency, and idempotency
Async Task Queuing (ARQ/Celery) — Provider syncs and alert processing likely run asynchronously; docker-compose-with-arq.yml indicates ARQ is used for background job processing
Alert Enrichment via AI Backends — Keep integrates OpenAI and Anthropic to add context to alerts; understanding prompt engineering, token limits, and API rate limiting is important for contributing AI features
Workflow Automation (Alert Response Pipelines) — Users define workflows to automatically respond to alerts (notify teams, create tickets, run remediation); understanding conditional logic, action chaining, and error handling is key
OpenTelemetry Instrumentation — docker-compose-with-otel.yaml present indicates OTEL is used for observability; understanding traces, metrics, and logs helps debug alert processing pipelines

prometheus-community/alertmanager — Alerts received and routed by Keep often originate from Prometheus AlertManager; understanding AlertManager routing and grouping is complementary
grafana/grafana — Grafana is a common alert source and visualization companion; Keep integrates with Grafana alerts and dashboards
opsgenie/atlassian-opsgenie-integration — OpsGenie is a direct competitor/alternative alert management platform; Keep likely integrates with it as a bi-directional provider
elastic/kibana — Elastic stack is a common source of alerts and logs for Keep's enrichment and correlation features
keephq/keep-workflows — Sister repository likely containing official workflow templates and examples for automating alert response

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add E2E tests for alert enrichment and correlation workflows

The repo has comprehensive E2E test workflows (.github/workflows/test-pr-e2e.yml, run-e2e-tests.yml) and extensive alert evaluation documentation (docs/alertevaluation/), but there are no visible E2E test files for the core alert enrichment, deduplication, and correlation features mentioned in the README. This is critical for an alert management platform where data accuracy is paramount.

[ ] Create tests/e2e/alert-enrichment.spec.ts to test enrichment pipeline with real provider integrations
[ ] Create tests/e2e/alert-correlation.spec.ts to test alert deduplication and correlation logic
[ ] Reference the alert evaluation examples in docs/alertevaluation/examples/ as test data sources
[ ] Integrate new tests into .github/workflows/test-pr-e2e.yml workflow

Add provider authentication tests and documentation

The repo has multiple Docker Compose configs with different auth modes (docker-compose-with-auth.yml, docker-compose-with-otel.yaml) and authentication docs (docs/authentication/okta.md), but there's no systematic test coverage for bi-directional provider integrations. New contributors need clearer guidance on testing provider connectivity and credential handling.

[ ] Create tests/unit/providers/auth-handler.test.ts to validate credential encryption and storage patterns
[ ] Create tests/integration/providers-smoke-test.ts to verify each provider's bi-directional integration setup
[ ] Add docs/authentication/provider-setup-guide.md documenting testing auth flows for new providers
[ ] Ensure test examples reference the existing CONTRIBUTING.md guidelines

Implement missing CLI command tests and documentation

The docs/cli/commands/ directory has many documented commands (alert-enrich, alert-get, alert-list, config, provider), but there's no corresponding test directory structure visible in the file listing. The test-pr-ut.yml workflow exists but likely lacks CLI-specific coverage for these commands.

[ ] Create tests/unit/cli/commands/alert-commands.test.ts covering alert-enrich, alert-get, alert-list flows
[ ] Create tests/unit/cli/commands/config-commands.test.ts covering cli-config-new, cli-config-show, cli-config
[ ] Create tests/unit/cli/commands/provider-commands.test.ts covering cli-provider operations
[ ] Update .github/workflows/test-pr-ut.yml to include CLI-specific test reporting and coverage thresholds

🌿Good first issues

Add unit test coverage for the alert deduplication logic in the backend; the test-pr-ut.yml workflow exists but likely has uncovered functions in the core alert matching/correlation system.
Document the provider integration template by creating a provider-skeleton.md in docs/ with a step-by-step guide; new providers are requested frequently (new_provider_request.md template exists) but there's no clear contributor guide.
Implement missing E2E test for the bi-directional sync workflow between Keep and a sample provider (e.g., mock PagerDuty sync); test-pr-e2e.yml infrastructure exists but coverage is likely incomplete for sync scenarios.

⭐Top contributors

Click to expand

@shahargl — 9 commits
@hakatt — 9 commits
@dependabot[bot] — 9 commits
@gecube — 8 commits
@diegovb-sys — 7 commits

📝Recent commits