alibaba/otter
阿里巴巴分布式数据库同步系统(解决中美异地机房)
Stale — last commit 2y ago
weakest axislast commit was 2y ago; no CI workflows detected
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
last commit was 2y ago; no CI workflows detected
- ✓26+ active contributors
- ✓Apache-2.0 licensed
- ✓Tests present
Show all 6 evidence items →Show less
- ⚠Stale — last commit 2y ago
- ⚠Concentrated ownership — top contributor handles 55% of recent commits
- ⚠No CI workflows detected
What would change the summary?
- →Use as dependency Mixed → Healthy if: 1 commit in the last 365 days
- →Deploy as-is Mixed → Healthy if: 1 commit in the last 180 days
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/alibaba/otter)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/alibaba/otter on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: alibaba/otter
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/alibaba/otter shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Stale — last commit 2y ago
- 26+ active contributors
- Apache-2.0 licensed
- Tests present
- ⚠ Stale — last commit 2y ago
- ⚠ Concentrated ownership — top contributor handles 55% of recent commits
- ⚠ No CI workflows detected
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live alibaba/otter
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/alibaba/otter.
What it runs against: a local clone of alibaba/otter — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in alibaba/otter | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | Last commit ≤ 743 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of alibaba/otter. If you don't
# have one yet, run these first:
#
# git clone https://github.com/alibaba/otter.git
# cd otter
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of alibaba/otter and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "alibaba/otter(\\.git)?\\b" \\
&& ok "origin remote is alibaba/otter" \\
|| miss "origin remote is not alibaba/otter (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 743 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~713d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/alibaba/otter"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Otter is a distributed database synchronization system that parses MySQL/Oracle binary logs in real-time and replicates data changes across geographic regions (e.g., China-US datacenters). Built on Alibaba's Canal log parser, it handles bidirectional sync with eventual consistency across 200+ database instances syncing 600M+ rows at enterprise scale. Multi-module Maven monorepo: manager/biz/ contains business logic (AutoKeeperStatService, DataSourceCreator), manager/web/ likely holds admin UI, node/ contains worker nodes, shared.common/ and shared.push/ hold cross-module utilities. Zookeeper handles distributed coordination; Canal integration at canal.instance.manager module level.
👥Who it's for
Database engineers and platform teams at companies with multi-datacenter deployments who need sub-second replication of transactional data without custom ETL code. Also relevant for teams managing both MySQL and Oracle heterogeneous database estates.
🌱Maturity & risk
Production-mature: actively deployed at Alibaba since 2011 (v4 released 2012+) managing 80+ node clusters, 1.5TB file sync, and 200+ database instances. Version 4.2.19-SNAPSHOT indicates ongoing maintenance. However, GitHub activity and test coverage are not visible from provided data—treat as 'mature but potentially slow-moving'.
Key risks: (1) Heavy Spring 2.x dependency stack with ibatis-sqlmap suggests pre-modern Spring era—upgrade path unclear; (2) Oracle JDBC (ojdbc6) is proprietary, limiting open-source contributor ease; (3) No visible test fixtures or CI config in file structure; (4) Monolithic manager/node architecture may make horizontal scaling difficult; (5) Last public commit date unknown—potential staleness despite internal use at Alibaba.
Active areas of work
Cannot determine from provided data—no commit history, PR list, or milestone information is visible. Version string '4.2.19-SNAPSHOT' suggests incremental development, but activity level is opaque.
🚀Get running
- Clone:
git clone https://github.com/alibaba/otter.git && cd otter - Install libs:
cd lib && bash install.sh(installs jmockit, jtester, ojdbc6) - Build:
mvn clean install -Dmaven.test.skip -Denv=releasefrom$otter_home - Packages appear in
$otter_home/target/
Daily commands:
Build required before run: mvn clean install -Dmaven.test.skip -Denv=release. Admin node startup via docker/image/admin/app.sh. Worker nodes via Docker: docker/run.sh. Requires running Zookeeper, MySQL/Oracle source & target databases, and Canal instance manager.
🗺️Map of the codebase
- manager/biz/src/main/java/com/alibaba/otter/manager/biz/autokeeper/AutoKeeperStatService.java: Core interface for Zookeeper-based state management and distributed scheduling—understand this to grasp node coordination.
- manager/biz/src/main/java/com/alibaba/otter/manager/biz/autokeeper/impl/AutoKeeperCollector.java: Collects runtime metrics from worker nodes; critical for understanding observability and health checking.
- manager/biz/src/main/java/com/alibaba/otter/manager/biz/common/DataSourceCreator.java: Handles pooled MySQL/Oracle connection creation—key to adding new target database support.
- manager/biz/pom.xml: Declares all dependencies including Canal, Spring, iBatis, and JDBC drivers—reference for understanding version constraints.
- lib/install.sh: Custom library installation (jmockit, ojdbc6)—must run before build to avoid missing proprietary JARs.
- docker/image/admin/ddl.sql: Database schema for manager metadata store—shows config storage structure.
🛠️How to make changes
For business logic: edit manager/biz/src/main/java/com/alibaba/otter/manager/biz/ (autokeeper/ServiceImpl.java for scheduling; DataSourceCreator.java for DB connectivity). For UI changes: explore manager/web/ (not shown but expected). For sync node behavior: likely in sibling node/ module. AutoKeeper classes handle state persistence and collection.
🪤Traps & gotchas
- ojdbc6 setup: Oracle JDBC is not in Maven Central; lib/install.sh manually adds it to local repo—skip this step and build fails silently on Oracle sync features.
- Zookeeper required: Nodes cannot start without a running Zookeeper cluster; config location not visible in provided files—check docker-compose or deployment docs.
- Canal dependency version:
${otter_canal_version}placeholder suggests version pinned in parent POM; wrong Canal version causes binary log parsing failures. - ibatis-sqlmap exclusivity: Code uses iBatis 2.x, not Hibernate or JPA—migrations complex; also Spring exclusions in canal.instance.manager suggest classpath conflicts.
- Database-specific DDL: ddl.sql is for manager schema; source/target schema requirements not in provided snippet—check wikis.
- Multi-node coordination: Single-master writes via Zookeeper leader election not obvious in class names—requires deep Zookeeper knowledge.
💡Concepts to learn
- Binary Log (Binlog) Parsing — Otter's core: Canal reads MySQL binlogs to extract INSERT/UPDATE/DELETE events; understanding binlog position, GTIDs, and row-based vs statement-based formats is critical for debugging replication lag and ensuring exactly-once semantics.
- Zookeeper Leader Election & Distributed Consensus — AutoKeeperStatService relies on Zookeeper to coordinate multi-node replication: only one leader writes to the target database at a time; understanding ephemeral nodes and watch callbacks is essential for troubleshooting split-brain scenarios.
- CDC (Change Data Capture) — Otter is a CDC system—it captures committed data changes from logs, not snapshots; this enables continuous replication without triggering source database queries; crucial for understanding why Otter can scale to 600M+ rows.
- Heterogeneous Database Replication (Oracle ↔ MySQL) — Otter's differentiator vs. MySQL native replication: DataSourceCreator abstracts Oracle and MySQL; handling dialect differences (NULL semantics, LOBs, sequences) requires careful schema mapping not visible in provided files.
- Distributed Transaction Idempotency & Exactly-Once Semantics — Replicating across WAN with dual-write (bidirectional) requires deduplication of updates—Otter must track applied binlog positions per node to prevent double-apply; critical for data consistency in Alibaba's dual-write architecture.
- Spring Dependency Injection & iBatis ORM — Codebase uses pre-Spring Boot Spring 2.x with iBatis (not Hibernate/JPA); configuration is XML-driven; understanding how manager/biz wires AutoKeeper beans and manages transaction scope is essential for feature additions.
- Connection Pooling & Commons DBCP Tuning — At 200+ database instances and 80+ nodes, connection pool saturation is a real risk; misconfigured DBCP max connections or eviction policies can cause replication stalls—DataSourceCreator's pool settings are a bottleneck.
🔗Related repos
alibaba/canal— Direct dependency of Otter—Canal parses MySQL binlogs; understanding Canal's EventParser and LogEventFilter is essential to debugging replication lag.alibaba/druid— Alibaba's JDBC pool alternative to Commons DBCP used here; relevant if considering performance optimization or migration off deprecated pooling.alibaba/nacos— Modern Alibaba config/service discovery system that could replace Zookeeper for future Otter 5.x; useful for understanding next-generation architecture.debezium/debezium— Open-source competitor solving same log-parsing + CDC problem for Kafka; reference design for modern approach if Otter maintainers consider alternatives.mysql/mysql-server— Source database for replication; understanding binlog format (STATEMENT, ROW, MIXED) is essential—check MySQL docs to tune Otter's Canal instance config.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add unit tests for AutoKeeperStatService and related components
The manager/biz/src/main/java/com/alibaba/otter/manager/biz/autokeeper/ directory contains critical distributed coordination logic (AutoKeeperStatServiceImpl, AutoKeeperCollector, AutoKeeperPersist) that lacks visible test coverage. Given this is a distributed system handling 200+ database instances and 80+ machine clusters, robust unit tests for keeper state management, data collection, and persistence are essential to prevent sync failures.
- [ ] Create manager/biz/src/test/java/com/alibaba/otter/manager/biz/autokeeper/ directory structure
- [ ] Add AutoKeeperStatServiceImplTest.java covering state transitions and error handling
- [ ] Add AutoKeeperCollectorTest.java to verify metric collection from multiple nodes
- [ ] Add AutoKeeperPersistTest.java to validate data persistence logic and recovery scenarios
- [ ] Use existing jmockit and jtester dependencies (already in lib/) for mocking
Create integration tests for DataSourceCreator with MySQL and Oracle connections
The manager/biz/src/main/java/com/alibaba/otter/manager/biz/common/DataSourceCreator.java is a critical component for managing connections to 200+ database instances across MySQL and Oracle. No visible test coverage exists. Integration tests would catch configuration issues, connection pool problems, and failover scenarios before they cause production sync failures.
- [ ] Create manager/biz/src/test/java/com/alibaba/otter/manager/biz/common/DataSourceCreatorTest.java
- [ ] Add test cases for MySQL connection creation with various charset and timeout configurations
- [ ] Add test cases for Oracle connection creation (use ojdbc6.jar already in lib/)
- [ ] Add test cases for connection pool validation and recovery after connection loss
- [ ] Add test cases verifying proper cleanup of stale datasource connections
Add GitHub Actions workflow for Docker builds and automated testing
The repo contains sophisticated Docker infrastructure (docker/Dockerfile, docker/image/admin/, docker/image/alidata/) and a build system, but no CI/CD pipeline is visible. Given the complexity of distributed deployment across machines and the README's build instructions (mvn clean install), an automated GitHub Actions workflow would catch build failures, Docker image issues, and test failures before contributors submit PRs.
- [ ] Create .github/workflows/docker-build.yml to build Docker images on every PR and push to main
- [ ] Create .github/workflows/maven-test.yml to run 'mvn clean install -Dmaven.test.skip=false' on PRs
- [ ] Add step to run docker/build.sh and verify image creation succeeds
- [ ] Add step to execute lib/install.sh and verify environment setup for new contributors
- [ ] Configure workflow to report test results and Docker build failures as PR status checks
🌿Good first issues
- Add unit tests for
DataSourceCreator.java—currently no test fixtures visible; start with mock MySQL/Oracle connections using embedded H2 or TestContainers to verify connection pool behavior under failure modes. - Document required environment variables and Zookeeper config location—create
DEPLOYMENT.mdshowing ZOOKEEPER_SERVERS, CANAL_HOST, DB credentials, and how to validate them via the admin health check endpoint (seedocker/image/admin/health.sh). - Extract Spring bean initialization from manager/biz into a standalone Spring Boot starter module (e.g.,
otter-spring-boot-starter/)—would reduce onboarding friction and enable easier testing; current XML config-heavy approach is pre-Boot era.
⭐Top contributors
Click to expand
Top contributors
- @agapple — 55 commits
- @Wu-Jianqiang — 9 commits
- @whhe — 5 commits
- @dependabot[bot] — 5 commits
- @wang.xiukai — 3 commits
📝Recent commits
Click to expand
Recent commits
7544d05— fix: discard unsupported event to avoid exception (#1129) (whhe)4732738— 处理 Druid 版本导致的 MySQL 同步期间 java.lang.NoSuchFieldError: MYSQL (#1113) (quanllong)7af72a3— chore: remove duplicated dependency (yuchumian)846a265— update README.md (whhe)4960e11— Bump junit from 4.5 to 4.13.1 (#960) (dependabot[bot])3cae49d— remove redundant dependencies in dependencyManagement (#1058) (whhe)a85c733— update common dependencies and revert spring version for ibatis issue (#1057) (whhe)04cfb66— Bump logback-core from 1.1.3 to 1.2.9 (#1053) (dependabot[bot])490e2d0— Bump dubbo from 2.5.3 to 2.6.9 in /shared (#1049) (dependabot[bot])cea5728— Bump protobuf-java from 2.6.1 to 3.16.1 (#1038) (dependabot[bot])
🔒Security observations
- Critical · Outdated and Vulnerable JDBC Drivers —
manager/biz/pom.xml - mysql-connector-java and ojdbc6 dependencies. The project uses mysql-connector-java and ojdbc6 (Oracle JDBC) without specified versions. These drivers are likely outdated and may contain known security vulnerabilities. The ojdbc6 driver is particularly old (from 2013 era) and has multiple CVEs. Fix: Explicitly specify recent versions of database drivers (e.g., mysql-connector-java 8.0.33+, ojdbc11 or newer). Regularly update dependencies and perform vulnerability scans using OWASP Dependency-Check or similar tools. - High · iBatis SQL Framework with Injection Risk —
manager/biz/pom.xml - ibatis-sqlmap dependency and manager/biz/src/main/java/com/alibaba/otter/manager/biz/config/alarm/dal/ibatis/. The project uses Apache iBatis (ibatis-sqlmap), an older ORM framework that predates modern parameterized query protections. Combined with dynamic SQL generation patterns common in iBatis, this increases SQL injection risk if queries are not carefully constructed. Fix: Migrate to modern ORM frameworks like MyBatis-Plus or JPA/Hibernate. If iBatis must be retained, ensure all SQL queries use parameterized queries with proper type handlers. Implement input validation and use prepared statements exclusively. - High · Insecure Commons-DBCP Database Connection Pool —
manager/biz/pom.xml - commons-dbcp dependency. commons-dbcp is deprecated and has known security vulnerabilities. It lacks modern security features and connection pool protections present in newer frameworks like HikariCP or Apache Commons DBCP2. Fix: Upgrade to commons-dbcp2 or migrate to HikariCP, which provides superior performance, security, and is actively maintained. Apply connection pool timeout configurations and authentication protections. - High · Outdated Spring Framework Version —
manager/biz/pom.xml - spring-core and other spring-* dependencies. The pom.xml references 'org.springframework:spring-core' and 'org.springframework:spring-*' without explicit version constraints. Given the project's age (last major update appears to be around 2015-2016), the Spring version is likely vulnerable to multiple CVEs including authentication bypass, DoS, and injection attacks. Fix: Update to Spring Framework 5.3.x or 6.x LTS versions with security patches. Review and update all transitive Spring dependencies. Test compatibility thoroughly after upgrade. - High · Missing Security Configuration in Docker Setup —
docker/image/alidata/init/02init-sshd.sh, docker/Dockerfile. Docker configuration files (docker/image/alidata/init/02init-sshd.sh, docker/image/alidata/bin/main.sh) suggest SSH and daemon initialization without visible security hardening. No evidence of security scanning, minimal base images, or non-root user execution. Fix: Implement Docker security best practices: (1) Use minimal base images like alpine or distroless, (2) Run containers as non-root users, (3) Apply security scanning in CI/CD pipeline (Trivy, Snyk), (4) Implement proper SSH hardening if SSH is required, (5) Limit exposed ports, (6) Use read-only filesystems where possible. - Medium · CGLib Dependency Without Version Pinning —
manager/biz/pom.xml - cglib-nodep dependency. cglib-nodep is included without version constraints. CGLib has had security issues in past versions related to deserialization and bytecode manipulation. Without pinning, the build is non-deterministic. Fix: Specify explicit version for cglib-nodep (prefer 3.3.0+). Evaluate if CGLib is necessary or can be replaced with Java's built-in proxy mechanisms or more modern alternatives. - Medium · Potential SQL Injection in DataSourceCreator —
manager/biz/src/main/java/com/alibaba/otter/manager/. DataSourceCreator.java likely contains database configuration logic. Without seeing the source, the use of iBatis and JDBC suggests potential SQL injection risks if dynamic queries are constructed from user input without proper parameterization. Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.