RepoPilot

programthink/zhao

【编程随想】整理的《太子党关系网络》,专门揭露赵国的权贵

Concerns

Looks unmaintained — solo project with stale commits

ConcernsDependency

copyleft license (GPL-3.0) — review compatibility; last commit was 5y ago…

MixedFork & modify

no tests detected; no CI workflows detected…

HealthyLearn from

Documented and popular — useful reference codebase to read through.

MixedDeploy as-is

last commit was 5y ago; Scorecard "Branch-Protection" is 0/10…

  • Stale — last commit 5y ago
  • Solo or near-solo (1 contributor active in recent commits)
  • GPL-3.0 is copyleft — check downstream compatibility
  • No CI workflows detected
  • No test directory detected
  • Scorecard: marked unmaintained (0/10)
  • Scorecard: default branch unprotected (0/10)
  • GPL-3.0 licensed

What would improve this?

  • Use as dependency ConcernsMixed if: relicense under MIT/Apache-2.0 (rare for established libs); 1 commit in the last 365 days
  • Fork & modify MixedHealthy if: add a test suite
  • Deploy as-is MixedHealthy if: 1 commit in the last 180 days; bring "Branch-Protection" to ≥3/10 (see scorecard report)

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests + OpenSSF Scorecard

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Great to learn from" badge

Paste into your README — live-updates from the latest cached analysis.

RepoPilot: Great to learn from
[![RepoPilot: Great to learn from](https://repopilot.app/api/badge/programthink/zhao?axis=learn)](https://repopilot.app/r/programthink/zhao)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card

This card auto-renders when someone shares https://repopilot.app/r/programthink/zhao on X, Slack, or LinkedIn.

Ask AI about programthink/zhao

Grounded in the actual source code. Pick a starter question or write your own.

Or write your own question →

Onboarding doc

Onboarding: programthink/zhao

Generated by RepoPilot · 2026-06-19 · Source

🎯Verdict

AVOID — Looks unmaintained — solo project with stale commits

  • GPL-3.0 licensed
  • ⚠ Stale — last commit 5y ago
  • ⚠ Solo or near-solo (1 contributor active in recent commits)
  • ⚠ GPL-3.0 is copyleft — check downstream compatibility
  • ⚠ No CI workflows detected
  • ⚠ No test directory detected
  • ⚠ Scorecard: marked unmaintained (0/10)
  • ⚠ Scorecard: default branch unprotected (0/10)

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests + OpenSSF Scorecard</sub>

TL;DR

A crowdsourced knowledge graph documenting political connections and family relationships among Chinese elites and "princeling" families. The project stores 700+ individual profiles and 130+ family trees in YAML format, then uses Python scripts with Graphviz to render these relationships as network diagrams in PDF and JPG formats. Simple flat structure: data/ contains three subdirectories (person/, company/, family/) where each entity is a folder with a brief.yaml file and optional portrait.png. bin/make.py is a single Python entry point that reads all YAML, generates Graphviz DOT syntax, then invokes graphviz CLI to produce output. download/ stores pre-built PDF/JPG artifacts.

👥Who it's for

Political researchers, journalists, and transparency advocates who need to analyze power structures in the Chinese government and connected business interests. Contributors are primarily diaspora Chinese programmers and non-technical volunteers who understand Mandarin and want to expose elite networks.

🌱Maturity & risk

Active but minimalist: launched February 2016 with 363 stars and 88 forks by day two (per README), indicating strong initial interest. No evidence of continuous integration, automated tests, or recent commit activity in provided metadata. The project is mature enough for its intended use (static reference documentation) but shows traits of a cause-driven effort rather than a sustainably maintained codebase.

Single-point-of-failure on the original maintainer (programthink) with no visible CI/CD pipeline or test suite to prevent data corruption during contributions. The YAML data structure is human-editable (a feature, not a bug) but lacks schema validation or linting. Primary risk is geopolitical: the GitHub repo itself may face institutional pressure or blocking in certain regions, making forking/mirroring critical for preservation.

Active areas of work

No explicit activity data provided. Based on README, the stated plan is continuous updates around government transitions (e.g., adding new elite members, companies, family connections after Party Congress meetings). The README explicitly commits to ongoing maintenance but does not specify recent work or active PRs.

🚀Get running

git clone https://github.com/programthink/zhao.git && cd zhao && pip install PyYAML && apt-get install graphviz (or brew install graphviz on macOS) && python bin/make.py pdf

Daily commands: cd bin && python make.py pdf (outputs PDF to download/pdf/) or python make.py jpg (outputs JPG to download/jpg/). Requires graphviz system package and PyYAML Python package installed.

🗺️Map of the codebase

  • bin/make.py — Build script that generates visualizations and documents from the YAML data files; entry point for creating deliverables
  • data/family — Core data layer containing 700+ individual family member YAML files that form the backbone of the entire relationship network
  • data/company — Supporting data layer documenting corporate entities and their connections to powerful families
  • README.wiki — Project overview, data format specification (YAML), and contribution guidelines that all collaborators must understand
  • LICENSE — Legal terms governing usage, modification, and redistribution of the dataset

🧩Components & responsibilities

  • YAML data files (family + company) (YAML) — Store structured records of individuals, their positions, family ties, corporate affiliations, and source citations
    • Failure mode: Malformed YAML blocks build; missing citations undermine credibility; incorrect relationships create invalid network topology
  • bin/make.py (Python, YAML parser, graph library, PDF/image renderer) — Orchestrates reading all YAML files, validating schema, constructing relationship graph, and generating output PDFs/JPGs
    • Failure mode: Parse errors halt build; missing output files leave users with outdated visualizations; logic bugs create incorrect family trees
  • GitHub repository & Pull Request workflow (Git, GitHub) — Provides version control, collaboration, and editorial review mechanism for data contributions
    • Failure mode: Loss of commit history if repository deleted; merge conflicts on simultaneous edits; spam pull requests if not moderated
  • /download directory & release artifacts (GitHub static file hosting) — Serves final PDF and JPG outputs to public consumers
    • Failure mode: Stale outputs if builds fail; large file sizes can be slow to download; no versioning if only latest is kept

🔀Data flow

  • Contributors (GitHub users)YAML files in data/family/ and data/company/ — Humans edit YAML via Pull Request; each entry cites Wikipedia or tier-1 media sources
  • YAML filesbin/make.py — Build script parses all YAML, validates schema, extracts relationships (parent→child, spouse, siblings, corporate roles)
  • bin/make.py/download (PDF & JPG) — Script generates final visualizations: relationship diagrams, family trees, network graphs
  • /download artifactsPublic consumers (researchers, journalists) — End-users download and view pre-rendered PDFs/images via GitHub's file browser or direct links
  • Researchers (feedback)GitHub issues or blog comments — Users report errors or propose additions; maintainer evaluates against citation standards

🛠️How to make changes

Add a new person's family record

  1. Create a new YAML file in data/family/ with the person's Chinese name (e.g., 新人名.yaml) (data/family/[新人名].yaml)
  2. Follow the YAML schema shown in README.wiki with fields: birth/death dates, official positions, family relationships, and references to Wikipedia or international media sources (README.wiki)
  3. Include documented relationships to other family members and corporate board positions (data/family/[新人名].yaml)
  4. Run bin/make.py to regenerate network visualizations and relationships (bin/make.py)
  5. Submit a Pull Request or GitHub issue with the new person's data and source citations (README.wiki)

Document a new corporation or entity

  1. Create a new subdirectory under data/company/ with the entity name (e.g., data/company/新企业名/) (data/company)
  2. Create brief.yaml inside the new directory documenting the organization, leadership, and family connections (data/company/[新企业名]/brief.yaml)
  3. Cross-reference related family records in data/family/ to link individuals to organizations (data/family)
  4. Regenerate visualizations via bin/make.py (bin/make.py)

Correct or update existing person's information

  1. Locate the person's YAML file in data/family/[人名].yaml (data/family)
  2. Edit the YAML with corrections, ensuring comments explain changes and cite reliable sources (Wikipedia, Reuters, FT, WSJ, NYT) (data/family/[人名].yaml)
  3. Update family relationships if positions or connections changed (data/family/[人名].yaml)
  4. Regenerate network to validate consistency (bin/make.py)
  5. Submit Pull Request or GitHub issue referencing the source evidence (README.wiki)

🔧Why these technologies

  • YAML format for data — Simple, human-readable schema that enables non-technical contributors to edit family trees and corporate ties without programming knowledge; comments provide inline documentation
  • GitHub (git + Pull Requests) — Decentralized collaboration model allows crowd-sourced fact-checking and verification; audit trail preserves attribution and change history
  • Python build scripts (bin/make.py) — Programmatic generation of PDF and visual outputs from a single YAML source-of-truth prevents inconsistency and enables reproducible builds
  • Wikipedia + international media sources as references — Establishes data credibility and legal defensibility by grounding claims in public, verifiable sources rather than speculation

⚖️Trade-offs already made

  • Centralized YAML data + decentralized GitHub contribution model

    • Why: Allows both transparency (open source) and editorial control (single source-of-truth prevents vandalism)
    • Consequence: Requires maintainer review of pull requests, introducing latency but ensuring data quality
  • Static generated PDFs/JPGs rather than dynamic web interface

    • Why: Avoids hosting infrastructure, reduces attack surface, and works across devices offline
    • Consequence: Updates require full rebuild and re-publication; no real-time querying capability
  • Manual citation requirements (Wikipedia, Reuters, FT, WSJ, NYT only)

    • Why: Ensures legal defensibility and prevents original research / unverified claims
    • Consequence: Slows data entry and excludes some true information not yet published in tier-1 sources

🚫Non-goals (don't propose these)

  • Does not provide real-time tracking of political changes or breaking news
  • Does not include non-public (leaked/classified) intelligence sources
  • Does not host a dynamic web server or database—purely static dataset
  • Does not perform automated scraping; all data is manually curated
  • Does not handle authentication or access control—all data is public

⚠️Anti-patterns to avoid

  • Unverified or single-source claims in YAML (High)data/family/*.yaml: Fields lacking citations to Wikipedia, Reuters, FT, WSJ, or NYT reduce credibility; China-specific claims must be cross-validated with international sources to avoid propaganda
  • Circular family relationships (A→B→C→A) (Medium)data/family/*.yaml relationships: Parent/child or spouse cycles indicate data entry errors; should be detected and flagged by bin/make.py validation

🪤Traps & gotchas

No explicit dependency pinning: PyYAML major versions (3.x vs 4.x vs 5.x) can break YAML parsing behavior on certain constructs. Graphviz must be installed as a system package (not pip-installable on all platforms); missing it silently fails or crashes bin/make.py. The script assumes UTF-8 locale and may corrupt output on Windows with CP-1252 encoding. No validation: malformed YAML in any single file can crash the entire pipeline without error recovery.

🏗️Architecture

💡Concepts to learn

  • DOT language (Graphviz syntax) — bin/make.py generates DOT code that Graphviz renders; understanding DOT is essential to modify graph layout, styling, or node/edge representation
  • YAML format and schema design — All 700+ data files are YAML; contributors must understand nested YAML structure, indentation rules, and the implicit schema (field names, types) to add/edit entries correctly
  • Directed acyclic graphs (DAGs) and kinship networks — Family relationships form DAGs (parent → child edges); understanding DAG traversal and cycle detection helps explain how the graph models inheritance and political succession
  • Knowledge graph / RDF-style entity linking — The YAML structure encodes entities (people, companies) and relationships (family, corporate board seats); this is foundational to semantic graph databases and linked data
  • Graph rendering and force-directed layout — Graphviz applies layout algorithms (Neato, dot, circo) to position nodes; understanding layout is key to producing readable relationship diagrams from large networks
  • Python subprocess and CLI tool integration — bin/make.py spawns graphviz as a subprocess; understanding Python's subprocess module is needed to debug tool invocation, handle errors, and extend the pipeline
  • Data provenance and source attribution — The README emphasizes Wikipedia and international media as sources; the project implicitly models data provenance; contributors should understand how to cite and verify facts in authoritarian contexts
  • gephi/gephi — Visualization and analysis tool that can import the network graphs this project generates; commonly used downstream for interactive exploration of relationship networks
  • cytoscape/cytoscape.js — JavaScript library for rendering interactive network graphs; could be used to create a web dashboard for exploring the zhao dataset dynamically
  • investigativedata/aleph — Open-source tool for investigators to manage and link large datasets of people, companies, and relationships; ideological cousin designed for transparency research
  • apenwarr/git-big-picture — Graphviz-based Git history visualizer; demonstrates similar pattern of using Graphviz to render complex relationship data from a simple source format
  • networkx/networkx — Python library for constructing, analyzing, and visualizing graphs; could replace or augment the custom graph-building logic in make.py for advanced network metrics

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Create YAML schema validation script for data quality assurance

The repo contains 700+ YAML files in data/family/ and data/company/ directories with no apparent validation. A Python script in bin/ could validate all brief.yaml files against a defined schema, ensuring consistency in family relationships, company affiliations, and data fields. This would catch malformed entries, missing required fields, and inconsistent formatting before they're committed.

  • [ ] Create bin/validate.py with YAML schema definition for family records (name, birth, death, relatives, positions, etc.)
  • [ ] Create YAML schema definition for company records (name, industry, key personnel, relationships, etc.)
  • [ ] Implement validation logic to check all data/family/.yaml and data/company//*.yaml files
  • [ ] Add error reporting that lists file names and specific validation failures
  • [ ] Test against sample files to ensure catches both valid and invalid entries

Build relationship graph export functionality (JSON/CSV formats)

Currently bin/make.py generates PDF/JPG outputs, but there's no way to export the structured relationship data in machine-readable formats. Add functionality to export family trees and company relationships as JSON and CSV, enabling data analysis, network visualization tools, and easier cross-referencing.

  • [ ] Extend bin/make.py to add --format json and --format csv export options
  • [ ] Create export logic that parses all data/family/*.yaml files and outputs family relationships as JSON (id, name, relatives[], positions[], companies[])
  • [ ] Create CSV export that flattens relationships into rows (person_name, relation_type, related_person_name, company_affiliation)
  • [ ] Test exports on a subset of data files to verify structure and completeness
  • [ ] Document the JSON/CSV schema in README.wiki with examples

Add automated duplicate detection and merging utility

With 700+ family records, there's high risk of duplicate or near-duplicate entries (same person under different names, simplified vs. traditional characters, etc.). Create a bin/dedup.py script that identifies potential duplicates across data/family/ using fuzzy matching and interactive merge functionality.

  • [ ] Implement fuzzy string matching (e.g., Levenshtein distance) in bin/dedup.py to find similar person names across data/family/*.yaml
  • [ ] Add alias detection logic to cross-reference common name variations in Chinese (simplified/traditional, nicknames)
  • [ ] Create interactive CLI prompt to review candidates and merge records safely (combining relationship data, updating references)
  • [ ] Generate a dedup report showing before/after record counts and merge operations performed
  • [ ] Test against data/family/ directory to identify actual duplicates and validate merge correctness

🌿Good first issues

  • Add JSON schema validation to data/ files: write a Python script that validates all brief.yaml files against a formal schema to catch typos and inconsistent field names before rendering
  • Extend bin/make.py to generate a CSV export: add a --csv flag that flattens family relationships into edge-list format for import into network analysis tools like Gephi or Cytoscape
  • Create a data/ README template: write a standard brief.yaml example file with all supported fields documented, then auto-validate that all person/family/company entries follow it

Top contributors

Click to expand

📝Recent commits

Click to expand
  • e8eea46 — 补充个人描述 (programthink)
  • bec7a49 — Initial commit (programthink)
  • 80d68a0 — Initial commit (programthink)
  • fab5071 — Initial commit (programthink)
  • 09b8155 — Initial commit (programthink)
  • 0c2e447 — 新增“韩正家族” (programthink)
  • 3409d7a — Initial commit (programthink)
  • 9a3e8bb — Initial commit (programthink)
  • bc122e8 — Initial commit (programthink)
  • f3723d0 — Initial commit (programthink)

🔒Security observations

This is a data collection project with relatively low security risk profile due to its nature (static YAML/documentation files with no web server, database, or user input processing). Primary concerns are: (1) unsafe YAML parsing in build scripts, (2) lack of visible dependency management, (3) no contribution validation procedures. The project should implement basic safeguards around data file handling and contributor validation processes. No critical vulnerabilities detected in the available codebase structure.

  • Low · No Dependency Management File Detected — Project root / bin/make.py. The codebase appears to be a data collection project without a visible package manager file (requirements.txt, package.json, Gemfile, etc.). While this reduces supply chain risk, it makes dependency tracking impossible if external libraries are used in bin/make.py. Fix: If external dependencies are used, implement proper dependency management with version pinning (requirements.txt for Python, package-lock.json for Node.js, etc.)
  • Low · Incomplete README Security Information — README.wiki. The README file is truncated and doesn't include information about secure contribution guidelines, data validation procedures, or vulnerability disclosure policy. Fix: Complete the README with security guidelines, including how contributors should validate data before submission and a responsible disclosure process for security issues.
  • Low · Build Script Not Reviewed — bin/make.py. The bin/make.py script's implementation cannot be fully analyzed from the file structure alone. Python build scripts can pose risks if they execute untrusted input or perform unsafe operations. Fix: Review the make.py script to ensure: (1) no unsafe deserialization of YAML/JSON, (2) no shell command injection from file paths, (3) proper input validation of family and company data files.
  • Low · YAML File Parsing Risk — data/family/*.yaml, data/company/*/brief.yaml. The project uses .yaml files extensively (brief.yaml files in company and family directories). Unsafe YAML parsing can lead to arbitrary code execution if files contain malicious content. Fix: Ensure YAML parsing uses safe loaders (PyYAML's safe_load, not load). Implement schema validation for all YAML files to prevent unexpected fields.

LLM-derived; treat as a starting point, not a security audit.

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/programthink/zhao shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live programthink/zhao repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/programthink/zhao.

What it runs against: a local clone of programthink/zhao — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in programthink/zhao | Confirms the artifact applies here, not a fork | | 2 | License is still GPL-3.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 1773 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>programthink/zhao</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of programthink/zhao. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/programthink/zhao.git
#   cd zhao
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of programthink/zhao and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "programthink/zhao(\\.git)?\\b" \\
  && ok "origin remote is programthink/zhao" \\
  || miss "origin remote is not programthink/zhao (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(GPL-3\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"GPL-3\\.0\"" package.json 2>/dev/null) \\
  && ok "license is GPL-3.0" \\
  || miss "license drift — was GPL-3.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "bin/make.py" \\
  && ok "bin/make.py" \\
  || miss "missing critical file: bin/make.py"
test -f "data/family" \\
  && ok "data/family" \\
  || miss "missing critical file: data/family"
test -f "data/company" \\
  && ok "data/company" \\
  || miss "missing critical file: data/company"
test -f "README.wiki" \\
  && ok "README.wiki" \\
  || miss "missing critical file: README.wiki"
test -f "LICENSE" \\
  && ok "LICENSE" \\
  || miss "missing critical file: LICENSE"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 1773 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~1743d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/programthink/zhao"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Embed this chat in your README →

Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.

<iframe
  src="https://repopilot.app/embed/programthink/zhao"
  width="100%" height="500"
  style="border:1px solid #d0d7de; border-radius:8px;"
  allow="microphone"
  loading="lazy"
></iframe>