risingwavelabs/risingwave
Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real time, at scale.
Healthy across the board
Permissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ⚠No test directory detected
- ✓Last commit today
- ✓18 active contributors
- ✓Distributed ownership (top contributor 14% of recent commits)
- ✓Apache-2.0 licensed
- ✓CI configured
Computed from maintenance signals — commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/risingwavelabs/risingwave)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card
This card auto-renders when someone shares https://repopilot.app/r/risingwavelabs/risingwave on X, Slack, or LinkedIn.
Ask AI about risingwavelabs/risingwave
Grounded in the actual source code. Pick a starter question or write your own.
Onboarding doc
Onboarding: risingwavelabs/risingwave
Generated by RepoPilot · 2026-06-24 · Source
🎯Verdict
GO — Healthy across the board
- Last commit today
- 18 active contributors
- Distributed ownership (top contributor 14% of recent commits)
- Apache-2.0 licensed
- CI configured
- ⚠ No test directory detected
<sub>Computed from maintenance signals — commit recency, contributor breadth, bus factor, license, CI, tests</sub>
⚡TL;DR
RisingWave is a distributed streaming SQL database that ingests events from Kafka, PostgreSQL CDC, webhooks, and S3, processes them with incremental computation via SQL, and serves fresh results at low latency. It replaces the traditional stack of Debezium + Kafka + Flink + serving database with a single system optimized for agentic AI applications that require always-fresh, queryable data. Monorepo (Cargo workspace with 90+ crates under src/): core streaming engine in src/stream/, batch execution in src/batch/, metadata/orchestration in src/meta/, SQL frontend/planner in src/frontend/, connector layer in src/connector/, and persistent storage via src/storage/ (Hummock LSM). Integration tests are co-located in src/tests/ (simulation, e2e, MySQL, sqlsmith fuzzing).
👥Who it's for
Data engineers and ML platform teams building real-time AI agents and applications that need sub-second freshness of joined streams (e.g., CDC from PostgreSQL + webhook events + historical S3 data). Contributors are primarily Rust systems engineers and SQL query planner specialists working on streaming infrastructure.
🌱Maturity & risk
Actively developed, pre-1.0 (currently v2.9.0-alpha). The project has a structured monorepo with comprehensive CI via Buildkite, organized test suites (simulation, e2e, MySQL integration), and Cargo workspace discipline. Commit activity is clearly recent given the .agents/ directory and agent-driven CI improvements, indicating active engineering investment.
Core risk: Rust-first codebase (27.6M lines) with high compilation times and deep systems knowledge barrier. Secondary risk: large monorepo surface area (90+ workspace members) increases coordination overhead. No obvious single-maintainer bottleneck visible (well-staffed organization), but alpha versioning means API stability is not guaranteed and there may be breaking changes in minor versions.
Active areas of work
Active development on agentic AI streaming (visible in README emphasis), Buildkite CI improvements (.agents/skills/fix-buildkite-ci/), and Rust analyzer tooling (.agents/skills/risingwave-rust-analyzer/). The .agents/ directory with YAML skill definitions suggests AI-assisted development workflows and CI automation are recent additions.
🚀Get running
git clone https://github.com/risingwavelabs/risingwave.git
cd risingwave
curl -L https://risingwave.com/sh | sh # or use Docker
cargo build --release # builds entire workspace
For quick local demo: curl -L https://risingwave.com/sh | sh (installs pre-built binary and spins up a single-node cluster).
Daily commands:
cargo build --release
# Single-node cluster:
risingwave standalone
# Or with Docker:
docker run --name risingwave -p 4566:4566 risingwavelabs/risingwave
Then connect via psql -h localhost -p 4566 -d dev -U root and run standard SQL DDL/DML.
🗺️Map of the codebase
Cargo.toml— Root workspace configuration defining all member crates and dependencies for the distributed streaming platform..bk.yaml— Buildkite CI/CD pipeline configuration orchestrating build, test, and deployment workflows.README.md— Primary entry point documenting RisingWave's architecture, capabilities, and setup for all contributors.CONTRIBUTING.md— Contribution guidelines and development workflow standards that all contributors must follow..github/CODEOWNERS— Ownership mapping defining which teams review which subsystems across the distributed codebase..cargo/config.toml— Cargo configuration controlling build behavior, profiles, and Rust toolchain settings for the entire workspace.
🛠️How to make changes
Add a New Connector Source/Sink
- Define connector properties and configuration in
src/connector/with_options/mod.rs(src/connector/with_options) - Implement the connector handler trait in a new module under
src/connector/(src/connector) - Register codec for message format (JSON/Avro/Protobuf) in
src/connector/codec/(src/connector/codec) - Add configuration tests and integration tests in the connector's test directory (
src/connector)
Add a New Built-in SQL Function
- Define function signature in
src/expr/impl/src/functions/with derive macros fromsrc/expr/macro(src/expr/impl) - Implement function logic using the expression evaluation framework in
src/expr/core(src/expr/core) - Add type coercion rules in the function's derive macro attribute (
src/expr/macro) - Register the function in the frontend's function catalog in
src/frontend(src/frontend)
Add a New Streaming Operator/Executor
- Create executor struct implementing
StreamExecutortrait insrc/batch/executors/(src/batch/executors) - Register operator in the compute layer's executor factory in
src/compute(src/compute) - Add operator planning logic in
src/frontendto generate executor plan from logical plan (src/frontend) - Write integration tests using SLT (SQL Logic Test) format in the test suite (
src/frontend/planner_test)
Add a Cluster Management Feature
- Define gRPC service and message types in the protobuf definitions referenced by
src/meta(src/meta) - Implement RPC handlers in the meta service under
src/meta(src/meta) - Add state persistence logic using the distributed state machine in
src/meta(src/meta) - Expose dashboard endpoints in
src/meta/dashboardfor observability (src/meta/dashboard)
🔧Why these technologies
- Rust — Type-safe systems programming with zero-cost abstractions, critical for high-throughput sub-millisecond streaming with minimal GC pauses
- gRPC / Protobuf — Efficient serialization and RPC for distributed meta service communication and component coordination across cluster nodes
- PostgreSQL compatible SQL — Lowers adoption friction by leveraging familiar SQL semantics while enabling seamless PostgreSQL CDC source integration
- Actor/Task-based concurrency — Enables efficient handling of millions of concurrent streams through lightweight task scheduling without OS thread overhead
- Kafka/Kinesis/Pulsar connectors — Integrates with existing event streaming infrastructure ubiquitous in modern data architectures
⚖️Trade-offs already made
-
Distributed metadata service (Meta) as single source of truth
- Why: Simplifies consistency guarantees and coordination across compute nodes
- Consequence: Meta service becomes critical path; requires high availability setup to prevent cluster-wide outage
-
Batch execution model with micro-batches instead of row-at-a-time
- Why: Better cache locality, vectorization opportunities, reduced per-row overhead
- Consequence: Introduces batching latency (tens of milliseconds); not suitable for ultra-low-latency (<1ms) use cases
-
Stateless compute with external state storage (RocksDB/S3)
- Why: Enables horizontal scaling and fault tolerance; compute nodes are ephemeral
- Consequence: State I/O becomes bottleneck for stateful operators; requires careful tuning of checkpoint intervals
-
Fragment-based execution topology pushed to compute workers
- Why: Decouples frontend planning from runtime scheduling; enables independent compute scaling
- Consequence: Complex distributed state synchronization; debugging topology issues is harder
🚫Non-goals (don't propose these)
- Not a distributed database: RisingWave specializes in streaming compute, not transactional ACID writes or point lookups
- Not a real-time OLAP engine: Optimized for streaming aggregations, not complex multi-table analytical queries
- Not a log storage system: Events must be ingested from external sources (Kafka, PostgreSQL CDC); RisingWave does not durably retain all historical events
- Not a machine learning platform: Does not provide native ML model training/inference; focuses on feature engineering and stream transformation
🪤Traps & gotchas
- Compilation time: monorepo is large (27M+ lines Rust); clean builds can take 10+ minutes. Use
cargo nextest(configured in.config/nextest.toml) for faster test iteration. 2. Workspace hack:src/workspace-hack/contains dependency graph tricks to speed up builds; do not remove entries without understanding impact. 3. Hakari config:.config/hakari.tomlmanages feature resolution across workspace; changes here can break builds silently. 4. Test isolation: e2e tests insrc/tests/may require running services (PostgreSQL, Kafka); check CI YAML for setup. 5. Audit rules:.cargo/audit.tomldefines allowed/denied dependency versions; adding dependencies may trigger CI failures. 6. Edition 2024:edition = "2024"inCargo.tomlis non-standard; ensure your Rust toolchain is recent (1.75+).
🏗️Architecture
💡Concepts to learn
- Incremental Computation / Dataflow — RisingWave's core: processes data incrementally (only computing deltas) rather than batch-recomputing results. Crucial for understanding the
src/stream/architecture and state management. - Change Data Capture (CDC) — Native CDC from PG/MySQL transaction logs is a defining feature (replaces Debezium). Understanding log-based CDC is essential for the
src/connector/CDC implementations. - LSM Tree (Log-Structured Merge Tree) — RisingWave uses Hummock LSM for durable, incremental state storage (
src/storage/hummock_sdk/). Understanding compaction and write amplification is critical for storage optimization. - Exactly-Once Semantics via Checkpointing — Streaming systems must guarantee no data loss and no duplication; RisingWave uses checkpoint-based recovery. Core concept for understanding
src/meta/orchestration andsrc/stream/operators. - SQL Query Planning / Cost-Based Optimization — The
src/frontend/planner transforms SQL into optimized dataflow DAGs. Understanding query rewriting, cardinality estimation, and plan enumeration is essential for SQL feature development. - Streaming Join / Stateful Operators — RisingWave's ability to join streams and tables at low latency (core to the pitch) requires sophisticated state management and windowing. Essential concept in
src/stream/executor design. - Deterministic Simulation Testing — RisingWave uses
src/tests/simulation/for testing distributed correctness without real networks; enables reproducible, fast testing of complex failure scenarios.
🔗Related repos
apache/flink— Direct predecessor: Flink is a batch+stream processing engine that RisingWave replaces in the traditional stack (Debezium+Kafka+Flink). Understanding Flink's dataflow model helps understand RisingWave's execution layer.confluentinc/kafka— Primary upstream data source in RisingWave's connector ecosystem; understanding Kafka partitioning and semantics is essential for building Kafka sources/sinks.debezium/debezium— CDC tool that RisingWave competes with/replaces; its connectors for PG/MySQL are inspirational for RisingWave's native CDC implementation.postgres/postgres— RisingWave exposes a PostgreSQL-compatible SQL interface (psqlcompatible) and implements PG CDC; deep knowledge of PG internals (WAL, logical decoding) is relevant to contributors.risingwavelabs/risingwave-operator— Kubernetes operator for RisingWave; companion repo for production deployment automation and cloud-native integration.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive agent skill validation workflow in GitHub Actions
The repo has an agents framework (.agents/skills/) with OpenAI agent configurations, but there's no CI workflow to validate agent skill definitions. This should check SKILL.md formatting, verify OpenAI YAML schema compliance, validate reference markdown links, and ensure scripts are executable. This prevents broken agent configurations from merging.
- [ ] Create .github/workflows/validate-agent-skills.yml that runs on changes to .agents/skills/**
- [ ] Add schema validation for .agents/skills/*/agents/openai.yaml against OpenAI agent spec
- [ ] Verify all markdown links in .agents/skills//references/.md are valid
- [ ] Check that all scripts in .agents/skills/*/scripts/ exist and have executable permissions
- [ ] Add linting rules for SKILL.md format consistency across all skills
Implement missing unit tests for workspace-config crate
The workspace contains src/utils/workspace-config as a member crate but likely lacks comprehensive tests. Given the critical role of workspace configuration in a 40+ member monorepo, this crate needs robust test coverage for configuration parsing, validation, and error handling scenarios.
- [ ] Review src/utils/workspace-config/src/lib.rs to identify untested code paths
- [ ] Add unit tests for configuration file parsing (TOML/YAML)
- [ ] Add integration tests for multi-crate workspace resolution
- [ ] Add tests for invalid configuration handling and error messages
- [ ] Add tests for feature flag combinations across workspace members
Add E2E test coverage for OpenAI embedding service integration
The repo contains src/utils/openai_embedding_service but no corresponding e2e test suite. Given that RisingWave targets agentic AI workflows, testing the embedding service's real-world integration with the streaming engine is critical for reliability.
- [ ] Create e2e_test/openai_embedding_service/ directory following existing e2e_test patterns
- [ ] Add test that ingests event streams and applies embedding transformations via CREATE SINK
- [ ] Add tests for API error handling (rate limits, auth failures, timeouts)
- [ ] Add performance tests comparing embedding throughput against src/bench benchmarks
- [ ] Document test setup requirements in README (API key handling, mock server options)
🌿Good first issues
- Add SQL function documentation generator: write a tool that scans
src/expr/impl/function definitions and generates a markdown reference indocs/. Currently no catalog of built-in functions is visible in the file list.: Improves user onboarding and doesn't require deep system knowledge; can start by understanding the function macro DSL insrc/expr/macro/ - Extend connector tests:
src/tests/mysql_test/exists butsrc/tests/postgresql_cdc_test/is absent. Add a PostgreSQL CDC integration test that verifies INSERT/UPDATE/DELETE propagation from a real PG instance.: CDC is a core feature; test gaps are obvious from file structure. Low risk, high value; follow MySQL test pattern. - Add missing Agent skill documentation:
.agents/skills/has two skills withSKILL.mdfiles, but they lack examples. Enhance them with runnable example commands and expected output.: Improves the agent-assisted development workflow that's clearly being invested in; good introduction to the codebase's CI automation philosophy without touching core logic.
⭐Top contributors
Click to expand
Top contributors
- @dependabot[bot] — 14 commits
- @wenym1 — 14 commits
- @tabVersion — 12 commits
- @wcy-fdu — 10 commits
- @yuhao-su — 8 commits
📝Recent commits
Click to expand
Recent commits
2938419— refactor(storage): normalize compact task table ids (#25438) (Li0k)4e35c05— fix(meta): sync source secret deps on alter connection (#25582) (tabVersion)95c2b4d— chore(deps): Bump bitflags from 2.10.0 to 2.11.1 (#25353) (dependabot[bot])7dca006— chore(deps): Bump async-openai from 0.33.0 to 0.36.1 (#25513) (dependabot[bot])823536b— chore(deps): Bump parse-display from 0.10.0 to 0.11.0 (#25560) (dependabot[bot])41c0f30— feat(session): allow unitlessstatement_timeoutto meanms(#25557) (xiangjinwu)b8ddf40— refactor(source): disable maxwell, canal and citus-cdc (#25534) (xiangjinwu)590e926— chore: add lazy PR label guidance for agents (#25532) (tabVersion)8f53ab3— chore(deps): upgrade deltalake and align arrow (#25479) (Li0k)0853780— chore(deps): Bump org.apache.hive:hive-metastore from 4.1.0 to 4.2.0 in /java (#23934) (dependabot[bot])
🔒Security observations
- High · Potential Secrets in Version Control —
ci/.env. The presence of ci/.env file in the repository structure suggests environment variables may be tracked in version control. Environment files (.env) typically contain sensitive credentials like API keys, database passwords, and other secrets that should never be committed to version control. Fix: Ensure .env files are in .gitignore and use environment variable management tools (e.g., vault, AWS Secrets Manager). Audit git history to remove any committed secrets using tools like git-secrets or truffleHog. - High · LDAP Test Credentials Potentially Exposed —
ci/ldap-test. The presence of ci/ldap-test directory suggests LDAP testing infrastructure. Test credentials and configurations in version control can be exploited if the repository is compromised or becomes public. Fix: Store test credentials in a secure secrets management system. Use environment variables for sensitive test data. Ensure test configurations don't contain hardcoded credentials or use randomly generated test credentials. - Medium · Dependency Vulnerability Audit Configuration —
.cargo/audit.toml, .github/workflows/audit.yml. The presence of .cargo/audit.toml indicates cargo-audit is configured, which is good. However, without reviewing the actual audit configuration and CI integration details, there's a risk that vulnerable dependencies may not be actively monitored or that audit findings are not enforced in CI/CD. Fix: Ensure audit.yml workflow is configured to fail CI/CD on vulnerable dependencies. Regularly update dependencies. Consider using dependabot (already present in .github/dependabot.yml) in conjunction with cargo-audit for comprehensive coverage. - Medium · Limited Version Support for Security Updates —
SECURITY.md. SECURITY.md explicitly states that 'RisingWave only provides security support for the latest version.' This leaves users on previous versions vulnerable to known exploits without receiving patches. Fix: Consider extending security support to at least the last 2-3 minor versions. Establish and communicate a clear security release policy. Provide long-term support (LTS) releases if this is a critical infrastructure component. - Medium · Broad JNI and Java Bindings —
src/java_binding, src/jni_core. The workspace includes Java bindings (src/java_binding, src/jni_core) which create FFI boundaries. Java/JNI interfaces are common attack vectors if not properly validated, as they can bypass Rust's safety guarantees. Fix: Implement strict input validation on all JNI boundary crossings. Use jni-safe crates where possible. Conduct security reviews of JNI code. Consider using wrapper types to enforce safety contracts at the FFI boundary. - Medium · OpenAI Integration and API Key Handling —
src/utils/openai_embedding_service, .agents/skills/*/agents/openai.yaml. The codebase includes src/utils/openai_embedding_service and agent skills using OpenAI API. This suggests integration with external LLM services, which could expose API keys if not properly managed. Fix: Store OpenAI API keys in environment variables or secure vaults, never in configuration files. Implement API key rotation policies. Use least-privilege API keys with appropriate scopes. Monitor API usage for anomalies. - Medium · Docker and Container Security Gaps —
.github/aws-config/Dockerfile, ci/Dockerfile. While Dockerfiles exist (.github/aws-config/Dockerfile, ci/Dockerfile), there's no evidence of security scanning (e.g., Trivy, Snyk) in the CI/CD pipeline from the visible workflows. Base image selection and layer optimization for security are not visible. Fix: Add container image scanning to CI/CD pipeline. Use minimal base images (alpine or scratch). Implement multi-stage builds. Regularly scan and update base images. Sign container images. Implement runtime security policies. - Medium · SQL Injection Risk in Streaming SQL Platform —
src/sqlparser, src/frontend. As an event streaming platform with SQL capabilities (src/sqlparser, src/frontend), there's inherent risk of SQL injection vulnerabilities if user input is not properly parameterized. Fix: Enforce parameterized queries throughout the codebase. Implement query validation and sanitization. Use whit
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/risingwavelabs/risingwave shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live risingwavelabs/risingwave
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/risingwavelabs/risingwave.
What it runs against: a local clone of risingwavelabs/risingwave — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in risingwavelabs/risingwave | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of risingwavelabs/risingwave. If you don't
# have one yet, run these first:
#
# git clone https://github.com/risingwavelabs/risingwave.git
# cd risingwave
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of risingwavelabs/risingwave and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "risingwavelabs/risingwave(\\.git)?\\b" \\
&& ok "origin remote is risingwavelabs/risingwave" \\
|| miss "origin remote is not risingwavelabs/risingwave (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "Cargo.toml" \\
&& ok "Cargo.toml" \\
|| miss "missing critical file: Cargo.toml"
test -f ".bk.yaml" \\
&& ok ".bk.yaml" \\
|| miss "missing critical file: .bk.yaml"
test -f "README.md" \\
&& ok "README.md" \\
|| miss "missing critical file: README.md"
test -f "CONTRIBUTING.md" \\
&& ok "CONTRIBUTING.md" \\
|| miss "missing critical file: CONTRIBUTING.md"
test -f ".github/CODEOWNERS" \\
&& ok ".github/CODEOWNERS" \\
|| miss "missing critical file: .github/CODEOWNERS"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/risingwavelabs/risingwave"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.
Embed this chat in your README →
Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.
<iframe src="https://repopilot.app/embed/risingwavelabs/risingwave" width="100%" height="500" style="border:1px solid #d0d7de; border-radius:8px;" allow="microphone" loading="lazy" ></iframe>