XiaoMi/soar
SQL Optimizer And Rewriter
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓10 active contributors
- ✓Apache-2.0 licensed
- ✓CI configured
Show all 6 evidence items →Show less
- ✓Tests present
- ⚠Stale — last commit 2y ago
- ⚠Single-maintainer risk — top contributor 87% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/xiaomi/soar)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/xiaomi/soar on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: XiaoMi/soar
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/XiaoMi/soar shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- 10 active contributors
- Apache-2.0 licensed
- CI configured
- Tests present
- ⚠ Stale — last commit 2y ago
- ⚠ Single-maintainer risk — top contributor 87% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live XiaoMi/soar
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/XiaoMi/soar.
What it runs against: a local clone of XiaoMi/soar — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in XiaoMi/soar | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch dev exists | Catches branch renames |
| 4 | Last commit ≤ 904 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of XiaoMi/soar. If you don't
# have one yet, run these first:
#
# git clone https://github.com/XiaoMi/soar.git
# cd soar
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of XiaoMi/soar and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "XiaoMi/soar(\\.git)?\\b" \\
&& ok "origin remote is XiaoMi/soar" \\
|| miss "origin remote is not XiaoMi/soar (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify dev >/dev/null 2>&1 \\
&& ok "default branch dev exists" \\
|| miss "default branch dev no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 904 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~874d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/XiaoMi/soar"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
SOAR is an automated SQL optimizer and rewriter for MySQL-compatible SQL queries that analyzes and transforms SQL statements through heuristic-based optimization, multi-column index suggestions, EXPLAIN plan interpretation, and custom rewrite rules. It's a Go-based tool built by Xiaomi's database team that provides SQL fingerprinting, compression, formatting, and batch ALTER table consolidation capabilities. Monolithic Go project with two main packages: ast/ (SQL parsing, AST manipulation, pretty-printing, tokenization) and advisor/ (heuristic rule engine, EXPLAIN interpretation, index optimization logic). Entry point likely in a root main.go or cmd/ folder not listed; configuration via config files; golden-file based testing throughout.
👥Who it's for
Database administrators, SQL optimization engineers, and performance teams at organizations using MySQL who need automated SQL audit, optimization recommendations, and transformation without manual query analysis. Developers building database tooling or SQL quality gates in CI/CD pipelines also benefit.
🌱Maturity & risk
Production-ready and actively maintained. The repo shows clear structure with comprehensive test coverage (golden files in ast/testdata/, advisor/testdata/), CI setup via Travis CI, semantic versioning (VERSION file), and a detailed CHANGES.md. However, last visible activity and commit frequency cannot be determined from file listing alone, so assume stable but not aggressively updated.
Moderate risk: The project has a large dependency tree (Beego web framework, Azure storage, Consul, Vault indirect dependencies) that may introduce supply chain vulnerabilities; single primary maintainer (Xiaomi) suggests organizational continuity risk. The codebase is Go-heavy (952K lines) with limited recent context visible, and MySQL-only support limits addressability of polyglot environments.
Active areas of work
No specific PR or milestone data visible in file listing. Based on structure, active areas include SQL rewrite rules expansion (ast/rewrite.go), heuristic rule refinement (advisor/heuristic.go), and index advisor improvements (advisor/index.go). Check CHANGES.md and .travis.yml for recent activity patterns.
🚀Get running
git clone https://github.com/XiaoMi/soar.git
cd soar
make # builds binary (see Makefile)
./soar -help # verify installation
Daily commands:
make build # compiles soar binary to current directory
make test # runs all test suites with golden-file validation
make install # installs binary to $GOPATH/bin
./soar -config config.yaml -sql 'SELECT ...' # analyze SQL via CLI
🗺️Map of the codebase
- ast/meta.go: Core AST node definitions and schema metadata structures; foundational for all SQL parsing and rewrite operations
- ast/rewrite.go: SQL rewrite rule implementations and transformations; directly used by optimization pipeline
- advisor/heuristic.go: Heuristic rule engine logic for SQL optimization suggestions; core differentiation of SOAR
- advisor/index.go: Multi-column index optimization logic for SELECT, INSERT, UPDATE, DELETE; critical advisor feature
- ast/pretty.go: SQL formatting and pretty-printing; used for readability and fingerprinting
- Makefile: Build orchestration; defines test targets, binary compilation, and install procedure
🛠️How to make changes
New heuristic rules: Edit advisor/heuristic.go and advisor/rules.go, add test cases to advisor/heuristic_test.go, update golden file advisor/testdata/TestListHeuristicRules.golden. New rewrite patterns: Modify ast/rewrite.go, test in ast/rewrite_test.go, update ast/testdata/TestListRewriteRules.golden. SQL parsing enhancements: Enhance AST node structures in ast/meta.go and ast/node_array.go. Pretty-print logic: Change ast/pretty.go and regenerate ast/testdata/TestPretty.golden.
🪤Traps & gotchas
MySQL-only SQL dialect (no PostgreSQL, T-SQL, etc.); EXPLAIN output parsing assumes MySQL format (5.7+); golden-file tests are strict snapshot tests that must be manually regenerated with make test after intentional output changes (regeneration command not always obvious); configuration likely expected in config.yaml or environment variables (check doc/config.md); beware of Vitess parser version mismatches if vendored dependencies are updated.
💡Concepts to learn
- SQL Abstract Syntax Tree (AST) — SOAR's entire optimization pipeline depends on parsing SQL into an AST representation (ast/meta.go, ast/node_array.go); understanding node traversal and transformation is essential for any rewrite rule or optimization logic
- Heuristic Rule Engine — SOAR's core differentiation is its non-ML heuristic-based rules (advisor/heuristic.go) for query optimization; understanding condition matching and recommendation scoring is crucial for adding new rules
- SQL EXPLAIN Output Interpretation — advisor/explainer.go analyzes MySQL EXPLAIN plans to suggest optimizations; understanding key metrics like rows examined, join type, and key efficiency is essential for advisor accuracy
- Multi-column Index Optimization — advisor/index.go performs complex multi-column index recommendation across INSERT/UPDATE/DELETE/SELECT; understanding column selectivity, cardinality, and leading-column strategy is prerequisite knowledge
- SQL Fingerprinting — SOAR normalizes and compresses SQL queries into fingerprints for deduplication and aggregate analysis (ast/pretty.go); critical for identifying slow query patterns in production
- Golden File Testing — SOAR uses snapshot/golden-file testing extensively (ast/testdata/.golden, advisor/testdata/.golden); any output format change requires careful golden file regeneration and review
- Visitor Pattern for AST Traversal — SOAR likely uses visitor/traversal patterns (implied by ast/pretty.go, ast/rewrite.go) to walk AST nodes; understanding stateful traversal is needed for implementing new transformations
🔗Related repos
vitessio/vitess— Vitess is the upstream SQL parser library SOAR integrates for AST generation; understanding Vitess parser behavior is essential for debugging parse failurespingcap/tidb— TiDB is a distributed MySQL-compatible database with its own SQL optimizer; comparison/learning reference for heuristic optimization strategiespercona/percona-toolkit— Industry-standard Perl-based MySQL toolkit with pt-query-digest; SOAR is the Go-based modernized alternative for SQL analysis and fingerprintinggithub/gh-ost— GitHub's online schema change tool for MySQL; SOAR's ALTER consolidation feature complements this by pre-optimizing DDL statementsXiaoMi/DBProxy— Xiaomi's MySQL proxy; SOAR is often used upstream in the proxy pipeline for real-time SQL rewriting and optimization
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add GitHub Actions CI workflow to replace Travis CI
The repo uses an outdated .travis.yml for CI/CD. GitHub Actions is now the standard for GitHub-hosted projects, offering better integration, faster feedback, and native support for Go 1.15+. This will modernize the CI pipeline and improve contributor experience.
- [ ] Create .github/workflows/test.yml with steps for: go test ./..., go vet ./..., and golangci-lint
- [ ] Create .github/workflows/build.yml to build the soar binary for multiple platforms
- [ ] Add status badge updates to README.md pointing to GitHub Actions instead of Travis CI
- [ ] Test the workflows pass with the existing test suite in advisor/, ast/, and cmd/soar/
- [ ] Document in CONTRIBUTING.md that Travis CI can be deprecated once workflows are stable
Add comprehensive integration tests for ast/rewrite.go with golden files
The ast/testdata directory has golden test files, but ast/rewrite.go (which handles SQL rewriting rules) appears to lack thorough test coverage. Given this is a core feature of SOAR, adding more test cases will catch regressions and document expected behavior.
- [ ] Review ast/rewrite_test.go and identify missing test cases by comparing to rules defined in ast/rewrite.go
- [ ] Create new golden test file ast/testdata/TestRewriteRules_Advanced.golden with complex SQL rewrite scenarios
- [ ] Add test cases covering: nested subqueries, CTEs, window functions, and edge cases from advisor/rules.go
- [ ] Ensure tests validate both the AST transformation and the output formatting
- [ ] Update ast/rewrite_test.go to include these cases and verify they pass
Add missing unit tests for ast/meta.go schema introspection logic
The ast/meta.go file handles database schema metadata (crucial for optimization), but ast/meta_test.go exists alongside a TestSchemaMetaInfo.golden file. This suggests incomplete test coverage for complex schema scenarios. Better tests will ensure schema handling is robust.
- [ ] Examine ast/meta.go to identify untested functions (likely table/column resolution, type inference)
- [ ] Add test cases for: multi-database schemas, tables with reserved keywords, columns with special characters
- [ ] Create additional golden test file ast/testdata/TestSchemaMeta_EdgeCases.golden with corner cases
- [ ] Add tests validating interaction between meta.go and advisor/index.go for index recommendations
- [ ] Verify all public functions in ast/meta.go have corresponding test coverage
🌿Good first issues
- Add test coverage for
ast/node_array.go— it has no corresponding*_test.gofile visible in the listing, likely missing unit tests for array manipulation methods. - Expand golden file
advisor/testdata/TestIndexAdviseNoEnv.goldenwith edge cases for composite keys on high-cardinality columns — comment in code likely hints at incomplete coverage. - Document rewrite rules in
doc/folder with before/after SQL examples for each rule inadvisor/rules.go— rules are implemented but not user-facing documentation visible.
⭐Top contributors
Click to expand
Top contributors
- @martianzhang — 87 commits
- @wangqifan — 3 commits
- @LonelyChick — 3 commits
- @xiyangxixian — 1 commits
- @lipincheng — 1 commits
📝Recent commits
Click to expand
Recent commits
5ed8574— Merge pull request #318 from Tianion/dev (xiyangxixian)e5cd36f— fix RuleStringConcatenation FUN.003 (lipincheng)fab0463— Merge branch 'dev' of github.com:XiaoMi/soar into dev (martianzhang)edac95c— index advisor skip columns in if clause (martianzhang)15f9e34— fix disable-version-check test case (martianzhang)08ca16c— 新增配置项用于控制是否允许测试数据库版本小于线上数据库 (#297) (Keima-lpj)80439cf— Remove duplicate heuristict rule CLA.007 (martianzhang)4b1eea8— fix RuleDateNotQuote spell error (martianzhang)f720930— fix some test cases after pr merged (martianzhang)d431b19— fix typo of DSN config in config.go (#290) (Aurora)
🔒Security observations
This SQL optimizer/rewriter codebase has significant security concerns. The primary issues are: (1) Go 1.15 is severely outdated with unpatched CVEs, (2) Multiple
- High · Outdated Go Version —
go.mod (go 1.15). The project targets Go 1.15 which was released in August 2020 and is significantly outdated. This version lacks security patches for numerous vulnerabilities discovered in subsequent Go releases. Fix: Update to Go 1.21 or later (current stable versions). Review and update all dependencies after upgrading. - High · Multiple Outdated and Vulnerable Dependencies —
go.mod - multiple entries. Several dependencies have known vulnerabilities and are significantly outdated: astaxie/beego (v1.12.3 from 2020), go-sql-driver/mysql (v1.6.0), and numerous indirect dependencies marked as 'indirect' without version pinning. These packages contain known CVEs. Fix: Conduct a comprehensive dependency audit using 'go list -json -m all' and 'nancy' or 'trivy'. Update all dependencies to latest secure versions. Consider migrating from beego (unmaintained) to actively maintained alternatives. - Medium · Incomplete Dependency List —
go.mod. The go.mod file appears truncated (ends with 'github.com/') and does not show the complete list of dependencies. This makes it impossible to perform a full security audit of the dependency chain. Fix: Provide the complete, non-truncated go.mod file. Run 'go mod tidy' and verify all dependencies are explicitly listed. - Medium · SQL Query Processing Without Visible Input Validation —
ast/rewrite.go, advisor/index.go, cmd/soar/soar.go. The codebase is a SQL optimizer and rewriter. Modules like 'ast/rewrite.go' suggest SQL query manipulation. Without visible input sanitization patterns in the file structure, there's risk of SQL injection if user inputs are processed unsafely. Fix: Implement strict input validation and parameterized query handling. Use a properly tested SQL parser (verify the chosen parser is secure). Add comprehensive input validation tests covering edge cases and malicious payloads. - Medium · Potential Configuration Exposure —
common/config.go, common/config_test.go, cmd/soar/soar.go. The codebase includes 'common/config.go' and configuration handling. If configuration files contain database credentials or API keys, they could be exposed through logs or error messages. Fix: Never store credentials in code or config files. Use environment variables or secure vaults (HashiCorp Vault, AWS Secrets Manager). Implement credential masking in logs and error outputs. - Low · Unmaintained Web Framework Dependency —
go.mod - astaxie/beego v1.12.3, gorilla/* dependencies. astaxie/beego v1.12.3 is an unmaintained web framework. The project may include web server functionality (indicated by gorilla/mux, gorilla/handlers dependencies) using potentially outdated security patterns. Fix: Migrate to actively maintained frameworks like Gin, Echo, or standard library net/http with well-maintained middleware. Review and update all web-related dependencies. - Low · Missing Security Headers Configuration —
cmd/soar/soar.go, common/config.go. If the project exposes a web API (suggested by gorilla/mux and render dependencies), there's no visible configuration for security headers (CSP, X-Frame-Options, X-Content-Type-Options, HSTS). Fix: Implement security headers middleware. Add CORS validation, rate limiting, and request size limits. Document security configuration requirements. - Low · Indirect Dependencies Not Pinned —
go.mod - all 'indirect' entries. Many indirect dependencies are marked without explicit version constraints, allowing transitive dependency resolution to pull potentially incompatible or vulnerable versions. Fix: Run 'go mod tidy' and consider using 'go mod vendor' to lock in all transitive dependencies. Review the go.sum file to ensure integrity.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.