openark/orchestrator

Item: openark/orchestrator
Rating: 5
Author: RepoPilot

MySQL replication topology management and HA

Healthy

Healthy across all four use cases

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓12 active contributors
✓Distributed ownership (top contributor 49% of recent commits)
✓Apache-2.0 licensed

Show all 6 evidence items →

✓CI configured
✓Tests present
⚠Stale — last commit 1y ago

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/openark/orchestrator)](https://repopilot.app/r/openark/orchestrator)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/openark/orchestrator on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: openark/orchestrator

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/openark/orchestrator shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

12 active contributors
Distributed ownership (top contributor 49% of recent commits)
Apache-2.0 licensed
CI configured
Tests present
⚠ Stale — last commit 1y ago

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live openark/orchestrator repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/openark/orchestrator.

What it runs against: a local clone of openark/orchestrator — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in openark/orchestrator | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | Last commit ≤ 473 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>openark/orchestrator</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of openark/orchestrator. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/openark/orchestrator.git
#   cd orchestrator
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of openark/orchestrator and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "openark/orchestrator(\\.git)?\\b" \\
  && ok "origin remote is openark/orchestrator" \\
  || miss "origin remote is not openark/orchestrator (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 473 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~443d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/openark/orchestrator"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Orchestrator is a MySQL replication topology management and high-availability (HA) system written in Go that automatically discovers, visualizes, refactors, and recovers MySQL replication topologies. It detects master/replica failures, performs automated failover, and allows safe replication topology reorganization through a CLI, HTTP API, and web UI, understanding multiple replication strategies including GTID, Pseudo-GTID, and Binlog Servers. Monolithic Go service in the root with cmd/ entry points, go/ directory containing topology logic (discovery, refactoring, recovery), http/ for REST API handlers, and shell/javascript/python support scripts. Configuration lives in conf/ (JSON files like orchestrator-sample.conf.json), web UI in html/, tests distributed alongside source. Raft-based clustering supported via separate docker/Dockerfile.raft and config orchestrator-raft-env.conf.json.

👥Who it's for

MySQL database operators and SREs managing complex replication topologies at scale who need automated failure detection, topology visualization, and safe replica refactoring without manual binary log position tracking; teams running MySQL 5.6+ with multiple masters and replicas across datacenters.

🌱Maturity & risk

Actively developed but recently archived (January 2024); the codebase shows maturity with comprehensive test coverage via GitHub Actions (main.yml, system.yml, upgrade.yml workflows), extensive documentation in docs/, and production deployments evident from Raft HA setup and upgrade configurations. However, the archived status signals the maintainer is stepping back—active development has moved to the Percona fork at percona/orchestrator.

Repository is archived and no longer actively maintained by original author, posing long-term support risk. Heavy dependency on Go 1.16 (outdated; modern Go is 1.21+) with unpinned indirect dependencies (hashicorp/raft uses replace directive with zeroed-out version), increasing supply chain risk. The fork at percona/orchestrator should be preferred for new projects; this original repo is suitable only for legacy system maintenance.

Active areas of work

Repository is archived as of early 2024; no active development. The README explicitly directs users to fork percona/orchestrator. Last activity was consolidating into the Percona-maintained version. No open pull requests or active issues are being addressed.

🚀Get running

Clone and build with Go 1.16+: git clone https://github.com/openark/orchestrator.git && cd orchestrator && go build -o orchestrator ./cmd/orchestrator. Configuration via JSON in conf/ (start with conf/orchestrator-sample.conf.json). Run with ./orchestrator and access web UI on default port 3000. See docs/build.md for detailed build instructions.

Daily commands: Development: go build -o orchestrator ./cmd/orchestrator && ./orchestrator --config=conf/orchestrator-sample.conf.json. Docker: docker build -f docker/Dockerfile -t orchestrator . && docker run -p 3000:3000 orchestrator. CI/test: bash build.sh triggers GitHub Actions workflows (main.yml runs Go tests, system.yml spins up MySQL replicas for integration tests).

🗺️Map of the codebase

go/instance/instance.go: Core data structure representing a MySQL instance; defines how Orchestrator models replicas, masters, and their state
go/logic/recovery.go: Automated failover and recovery execution logic; determines which recovery method (master takeover, replica promotion, etc.) applies to a failure scenario
go/logic/topology.go: Topology refactoring engine; validates and executes safe reparenting operations like moving a replica to a new master
http/api.go: REST API endpoint definitions; exposes topology queries, recovery actions, and refactoring commands to CLI and web UI
conf/orchestrator-sample.conf.json: Reference configuration with all tunable parameters; essential for understanding discovery intervals, recovery policies, and backend setup
go/config/config.go: Configuration parser and validation; defines how JSON config maps to in-memory settings and environment variable overrides
docs/configuration-recovery.md: Documents recovery scenarios and orchestrator's decision tree; required reading to understand when and how automatic failover triggers

🛠️How to make changes

Add topology discovery logic in go/instance/ and go/topology/. Refactoring/recovery rules in go/logic/recover*.go and go/logic/topology.go. HTTP endpoints in http/handler*.go. Frontend changes in resources/public/js/. Configuration options in go/config/config.go. New failure detection strategies belong in go/logic/analysis.go. Tests: create *_test.go files alongside the package. See docs/configuration.md for all tuneable behaviors.

🪤Traps & gotchas

Config JSON is strict—missing required fields like MySQLTopologyUser and MySQLTopologyPassword will cause silent failures or panics; use docs/configuration-sample.md as template. MySQL connectivity requires network access to port 3306 on all topology instances; firewalls silently drop discovery. Pseudo-GTID recovery requires a special marker in binlogs (see docs/configuration-discovery-pseudo-gtid.md)—without it, recovery falls back to unsafe methods. Raft clustering (orchestrator-raft-env.conf.json) requires all nodes to be reachable and synchronized; split-brain scenarios are not auto-healed. Backend database (MySQL/Postgres) must be initialized; no auto-migration on schema changes between versions—check upgrade documentation before deploying a new release.

💡Concepts to learn

Pseudo-GTID — Orchestrator's secret weapon for safe recovery when true GTIDs aren't available; allows replica rebinding by injecting identifiable markers into the binlog stream, enabling recovery across different binary log positions
Topology Discovery via Crawler — Orchestrator actively crawls your MySQL instances to map replication relationships; understanding the discovery interval and failure detection thresholds is critical for production reliability
Binlog Server Pattern — Orchestrator supports pseudo-replicas that serve as binlog relay points (e.g., Percona Replication Manager); understanding this reduces topology constraints in large deployments
Raft Consensus for HA Clustering — Multiple Orchestrator instances coordinate via Raft to elect a leader that performs topology operations, preventing split-brain recovery decisions; critical for multi-datacenter deployments
Automated Failure Detection & Holistic Analysis — Orchestrator detects failures by analyzing topology state (replication lag, connectivity, binlog positions) rather than simple heartbeats, reducing false positives and enabling context-aware recovery
Safe Reparenting (Topology Refactoring) — Orchestrator validates and executes replica moves to new masters with binlog position tracking; prevents data loss and replication breaks that manual reparenting causes
Hook-based Recovery Execution — Before and after recovery, Orchestrator invokes user-defined scripts (pre-recovery/post-recovery hooks) for DNS updates, monitoring alerts, or custom logic; understanding hook sequencing is essential for production integration

percona/orchestrator — Official actively-maintained fork after openark archived this repo; use this for new deployments and bug fixes
github/gh-ost — Companion tool by same ecosystem authors for safe online schema migrations on MySQL replicas; integrates with Orchestrator topologies
vitessio/vitess — Alternative MySQL middleware providing automatic sharding, resharding, and HA; serves similar use case but via proxying rather than topology management
mysql/mysql-shell — MySQL's native shell with InnoDB Cluster HA features; competes with Orchestrator for modern MySQL 8.0+ setups but requires MySQL Group Replication
maxscale-projects/MaxScale — MySQL proxy with automatic failover and read-write splitting; complements Orchestrator by handling client routing after topology changes

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add unit tests for MySQL replication topology detection logic

The repo contains extensive configuration files for discovery (docs/configuration-discovery-*.md) and agent functionality (docs/agents.md), but the repository structure suggests core topology detection and validation code likely lacks comprehensive unit tests. Adding tests for the discovery and topology parsing logic would improve reliability and make contributions safer. This is critical for a HA tool where correctness is paramount.

[ ] Locate main topology detection code in src/ or go package structure (likely in a discovery or topology package)
[ ] Identify currently untested functions for parsing SHOW SLAVE STATUS, topology graphs, and pseudo-GTID detection
[ ] Create tests/topology_detection_test.go with test cases covering: normal replication chains, multi-source replication, circular topology detection, and master election scenarios
[ ] Reference existing test patterns in the codebase and add to CI pipeline via .github/workflows/main.yml

Add GitHub Actions workflow for validating all configuration samples

The repo contains 5+ configuration file examples (orchestrator-ci-env.conf.json, orchestrator-raft-env.conf.json, orchestrator-sample.conf.json, etc.) but no automated validation that these configs parse correctly. A new CI workflow could validate JSON syntax and schema compliance, preventing config drift and documentation inconsistencies. This would be a low-effort, high-value addition.

[ ] Create .github/workflows/config-validation.yml workflow file
[ ] Add JSON schema validation using jq or a Go-based validator for all conf/*.json files
[ ] Include schema checks for required fields (based on docs/configuration.md and configuration-*.md docs)
[ ] Document the validation rules in docs/configuration.md and trigger validation on PRs touching conf/ directory

Add integration tests for Raft-based deployment mode

The repo has dedicated Raft configuration (docs/deployment-raft.md, conf/orchestrator-raft-env.conf.json, docker/Dockerfile.raft) and docker-entry-raft entry point, but the existing .github/workflows (main.yml, system.yml, upgrade.yml) don't show explicit Raft cluster testing. Adding an integration test workflow for Raft consensus and failover scenarios would catch regressions in the HA cluster mode.

[ ] Create .github/workflows/raft-integration.yml workflow file
[ ] Build Docker Raft cluster (3+ nodes) using docker/Dockerfile.raft and docker-entry-raft script
[ ] Add test scenarios: leader election, split-brain recovery, topology metadata consistency across Raft nodes
[ ] Reference existing system.yml pattern and document Raft testing approach in docs/ci.md

🌿Good first issues

Add integration test coverage for Pseudo-GTID recovery scenarios in docs/configuration-discovery-pseudo-gtid.md—currently no standalone test files for this feature in the repo, only documentation.
Improve error handling and user-facing messages in go/topology/refactoring.go—many rejection scenarios log only to stderr without exposing clear API responses, making CLI debugging harder.
Extend docs/configuration-large.md with a concrete worked example of a 100+ instance topology; the file exists but contains only abstract principles, not a sample config or walk-through.

⭐Top contributors

Click to expand

[@Philipp Heckel](https://github.com/Philipp Heckel) — 49 commits
@shlomi-noach — 37 commits
@cndoit18 — 3 commits
@akatashev — 2 commits
@michaelcoburn — 2 commits

📝Recent commits