waditu/tushare

Item: waditu/tushare
Rating: 5
Author: RepoPilot

TuShare is a utility for crawling historical data of China stocks

Healthy

Healthy across all four use cases

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓20 active contributors
✓BSD-3-Clause licensed
✓CI configured
✓Tests present
⚠Stale — last commit 2y ago
⚠Concentrated ownership — top contributor handles 62% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/waditu/tushare)](https://repopilot.app/r/waditu/tushare)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/waditu/tushare on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: waditu/tushare

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/waditu/tushare shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

20 active contributors
BSD-3-Clause licensed
CI configured
Tests present
⚠ Stale — last commit 2y ago
⚠ Concentrated ownership — top contributor handles 62% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live waditu/tushare repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/waditu/tushare.

What it runs against: a local clone of waditu/tushare — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in waditu/tushare | Confirms the artifact applies here, not a fork | | 2 | License is still BSD-3-Clause | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 815 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>waditu/tushare</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of waditu/tushare. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/waditu/tushare.git
#   cd tushare
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of waditu/tushare and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "waditu/tushare(\\.git)?\\b" \\
  && ok "origin remote is waditu/tushare" \\
  || miss "origin remote is not waditu/tushare (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(BSD-3-Clause)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"BSD-3-Clause\"" package.json 2>/dev/null) \\
  && ok "license is BSD-3-Clause" \\
  || miss "license drift — was BSD-3-Clause at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "tushare/__init__.py" \\
  && ok "tushare/__init__.py" \\
  || miss "missing critical file: tushare/__init__.py"
test -f "tushare/stock/trading.py" \\
  && ok "tushare/stock/trading.py" \\
  || miss "missing critical file: tushare/stock/trading.py"
test -f "tushare/pro/client.py" \\
  && ok "tushare/pro/client.py" \\
  || miss "missing critical file: tushare/pro/client.py"
test -f "tushare/stock/cons.py" \\
  && ok "tushare/stock/cons.py" \\
  || miss "missing critical file: tushare/stock/cons.py"
test -f "setup.py" \\
  && ok "setup.py" \\
  || miss "missing critical file: setup.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 815 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~785d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/waditu/tushare"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

TuShare is a Python library for scraping, cleaning, and storing historical financial data for Chinese stocks and futures. It provides simple function calls to fetch OHLCV (Open, High, Low, Close, Volume) data, technical indicators, and fundamental information from Chinese exchanges, with built-in support for storing data in MongoDB or MySQL. Monolithic structure: core modules are at root level (tushare package), documentation lives in docs/ with per-feature .rst files (trading.rst, fundamental.rst, macro.rst, etc.), tests are in test/ directory with individual test files per feature (bar_test.py, classifying_test.py), and setup.py handles packaging.

👥Who it's for

Quantitative analysts and financial data engineers in China who need to backtest trading strategies or perform technical analysis on Chinese equities without building custom data pipelines. Also used by students learning financial data analysis.

🌱Maturity & risk

The project is in maintenance mode with active development shifted to a Pro version (tushare.pro). The codebase shows 466k lines of Python, has CI setup via Travis CI (.travis.yml present), and includes test files (test/ directory), but the README explicitly directs users to the new Pro version, indicating this legacy version is stable but not the focus of new feature development.

Single maintainer risk (waditu) is evident from the repository name pattern. The library depends on live data sources (stock exchanges) which can change without notice, breaking API compatibility. No git history is visible in the file list to assess commit recency, and the deprecation message in README indicates the maintainer may deprioritize bug fixes for the free version.

Active areas of work

The main development has moved to TuShare Pro (mentioned prominently in README pointing to tushare.pro). The legacy free version appears to be in maintenance: specific focus on new feature development is unclear from file timestamps, but the codebase is stable enough to recommend in the README despite the migration notice.

🚀Get running

git clone https://github.com/waditu/tushare.git && cd tushare && pip install -r requirements.txt && python setup.py install

Daily commands: This is a library, not a service. Import and call functions: python -c "import tushare as ts; print(ts.get_hist_data('600848'))" after installation.

🗺️Map of the codebase

tushare/__init__.py — Main entry point that exposes the public API and imports all submodules; essential for understanding what functionality is available.
tushare/stock/trading.py — Core module for stock trading data retrieval; handles historical data fetching which is the primary use case shown in README.
tushare/pro/client.py — Pro API client implementation; represents the newer API architecture that users are being migrated toward.
tushare/stock/cons.py — Constants and configuration for stock module; defines URLs, parameters, and shared configuration across stock functions.
setup.py — Package configuration and dependency declaration; critical for understanding installation and version management.
tushare/stock/globals.py — Global state and session management for HTTP requests; affects all network operations in the codebase.

🧩Components & responsibilities

Stock Module (tushare/stock/) (BeautifulSoup, pandas, requests) — Historical OHLCV, fundamentals, macro indicators, technical analysis, news, reference data for equities
- Failure mode: Website layout changes break HTML scraping; returns malformed or missing columns; network timeouts
Futures Module (tushare/futures/) (BeautifulSoup, pandas, requests) — Domestic (DCE, CZCE, SHFE) and international futures data including contracts, positions, and quotes
- Failure mode: Contract list changes; continuous contract logic breaks; data source migrations
Fund Module (tushare/fund/) (BeautifulSoup, pandas,) — Mutual fund NAV, holdings, performance metrics

🛠️How to make changes

Add a new stock data function

Create function in appropriate stock submodule (e.g., trading.py, fundamental.py) that fetches and parses data (tushare/stock/trading.py)
Define API endpoint and parameters in module's cons.py or the function itself (tushare/stock/cons.py)
Parse HTML/JSON response using BeautifulSoup or simplejson, return pandas DataFrame (tushare/stock/trading.py)
Import and re-export function in tushare/stock/init.py (tushare/stock/__init__.py)
Re-export in main tushare/init.py for top-level access (tushare/__init__.py)
Add unit tests in test/trading_test.py or appropriate test file (test/trading_test.py)

Add support for a new asset class (e.g., new market data type)

Create new directory under tushare/ (e.g., tushare/options/) (tushare/)
Create init.py to export public functions (tushare/options/__init__.py)
Create cons.py with API endpoints and constants (tushare/options/cons.py)
Create data retrieval module with fetch and parse logic (tushare/options/data.py)
Import asset class in main tushare/init.py (tushare/__init__.py)

Migrate a function to Pro API

Implement new function in tushare/pro/data_pro.py using client.py's query method (tushare/pro/data_pro.py)
Function should accept token parameter and call client.query(api_name, params) (tushare/pro/client.py)
Return pandas DataFrame with same schema as legacy version for backward compatibility (tushare/pro/data_pro.py)
Export from tushare/pro/init.py (tushare/pro/__init__.py)
Add integration tests calling Pro API with test token (test/trading_test.py)

🔧Why these technologies

Python 2.x/3.x compatibility — Maximizes user base in emerging markets and legacy systems; TuShare targets financial analysts who may use various Python versions
pandas DataFrame as return type — De facto standard for financial data manipulation in Python; integrates seamlessly with quantitative analysis workflows
BeautifulSoup for HTML parsing — Flexible scraping when official APIs are unavailable; allows extraction from rendered pages without Selenium
requests library for HTTP — Lightweight, widely-adopted, minimal dependencies; sufficient for simple GET requests without async needs
Modular organization by asset class — Reduces coupling and allows independent updates; users import only needed modules (tushare.stock, tushare.futures, etc.)

⚖️Trade-offs already made

Web scraping as primary data source instead of dedicated APIs
- Why: No centralized Chinese stock data API available at project inception; scraping enables broad data coverage
- Consequence: Fragile to website layout changes; requires frequent maintenance; slower than native APIs; potential legal/ToS issues
Legacy API (v1) and Pro API (v2) running in parallel
- Why: Pro API uses token-based authentication and better infrastructure; legacy maintains backward compatibility
- Consequence: Code duplication; maintenance burden; confusing for new users; unclear migration path
Synchronous blocking I/O with requests
- Why: Simple, synchronous code easier to understand for financial analysts unfamiliar with async
- Consequence: Slow for batch operations; cannot fetch multiple securities concurrently; poor scalability for high-frequency workflows
No built-in caching or rate limiting
- Why: Simplicity; delegates responsibility to user; avoids stale data issues
- Consequence: Users may inadvertently DoS data source; repeated calls refetch unchanged data; poor performance for large backtests

🚫Non-goals (don't propose these)

Real-time streaming data or tick-by-tick quotes
Portfolio management or trade execution
Authentication or user account management
Cross-asset class correlation analysis
Data storage (MongoDB/MySQL integration is referenced but not implemented in core)
Intraday or sub-minute granularity for most datasets

🪤Traps & gotchas

The library scrapes live Chinese stock exchange websites, so function behavior can break silently if those sites change their HTML structure without warning (lxml/BeautifulSoup is fragile to layout changes). The README mentions both free version and paid 'Pro' version—ensure you're looking at correct API docs for which version you're using. Python 2 support is mentioned but likely abandoned in practice. No explicit rate limiting visible in the structure, so aggressive scraping may trigger IP blocks from target websites.

🏗️Architecture

💡Concepts to learn

[Web Scraping (lxml/BeautifulSoup4)](https://docs.python-requests.org/en/master/ and https://www.crummy.com/software/BeautifulSoup/bs4/doc/) — TuShare fetches stock data by parsing HTML from Chinese exchange websites, making fragile to layout changes; understanding CSS selectors and DOM traversal is critical for debugging broken data sources.
Pandas DataFrame Normalization — All data returned from TuShare functions are pandas DataFrames with specific column orders and types (date as index, OHLCV as float); users must understand index manipulation and column selection to avoid errors.
Technical Indicators (Moving Averages, Turnover) — TuShare returns pre-calculated MA5, MA10, MA20 (moving averages) and turnover ratio in the same DataFrame; users need domain knowledge to validate correctness and understand when to use which indicator.
Time Series Indexing — Stock data is date-indexed (shown in README examples with 'date' as the DataFrame index), requiring time-based slicing like ts.get_hist_data('600848', start='2015-01-05', end='2015-01-09'); Pandas time indexing is non-obvious.
Chinese Stock Code Convention (6-digit ticker) — TuShare uses 6-digit Chinese stock codes (600848 in examples) which differ from Western ticker conventions; users must understand Shanghai Stock Exchange (600xxx) vs Shenzhen Stock Exchange (000xxx, 300xxx) prefixes.
Rate Limiting & IP Blocking Risk — The library scrapes live sources without explicit rate limiting visible in the code; users running batch operations may trigger IP blocks from exchanges, requiring exponential backoff or rotating proxies—not handled by the library.
Python 2 vs 3 Compatibility — TuShare targets both Python 2.x and 3.x (mentioned in README), meaning code uses compatible patterns (unicode handling, print functions); maintainers should be aware this constraint is likely abandoned in practice.

akshare/akshare — Modern alternative for Chinese financial data fetching with more active development and support for newer Chinese data sources like Sina and Tencent.
twopiraman/Stock-Data-Visualization-by-Pandas-Matplotlib — Example repository showing how to use financial data libraries like TuShare to build analysis pipelines and visualizations.
jealous/stockstats — Companion library for calculating technical indicators (MA, RSI, MACD) on top of OHLCV data fetched by TuShare.
pandas-dev/pandas — Core dependency for data manipulation in TuShare; understanding DataFrame operations is essential for using this library effectively.
ccxt/ccxt — Similar architecture for multi-exchange data aggregation but focuses on cryptocurrency exchanges instead of Chinese stocks.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for tushare/bond/bonds.py and tushare/coins/market.py modules

The test directory has tests for most core modules (trading, macro, news, etc.), but there are no corresponding test files for the bond and coins submodules. These financial data modules need dedicated unit tests to ensure data retrieval and parsing work correctly. This improves code reliability and makes it easier for contributors to modify these modules safely.

[ ] Create test/bond_test.py with unit tests for tushare/bond/bonds.py covering all public functions
[ ] Create test/coins_test.py with unit tests for tushare/coins/market.py covering cryptocurrency market data retrieval
[ ] Add mock HTTP responses for external API calls to avoid network dependencies during testing
[ ] Update test_unittest.py to include the new test modules in the test suite

Migrate from .travis.yml to GitHub Actions workflow for Python 2.x/3.x testing

The repo uses Travis CI (.travis.yml) for legacy CI, but GitHub Actions is now the standard. Additionally, Python 2 reached end-of-life in 2020. A modern GitHub Actions workflow should test Python 3.7+ across multiple versions, run the test suite (test/), and provide faster feedback. This removes outdated CI infrastructure and ensures the project works with current Python versions.

[ ] Create .github/workflows/python-tests.yml with matrix testing for Python 3.7, 3.8, 3.9, 3.10, 3.11
[ ] Include steps to install dependencies from requirements.txt and run pytest on test/ directory
[ ] Add linting step (pylint or flake8) to catch code quality issues
[ ] Remove or deprecate .travis.yml with a note in README about the migration

Add integration documentation and examples for tushare/data module and new Pro API

The README mentions 'TuShare Pro版已发布' (TuShare Pro version released) and directs users to https://tushare.pro, but the repo still has minimal documentation for the actual tushare/data module structure and no examples for Pro API usage. The docs/ folder has .rst files for specific features but lacks comprehensive API reference documentation. Adding clear examples and Pro API documentation will help users migrate from the old API.

[ ] Create docs/pro_api.rst documenting the new Pro API endpoints with code examples
[ ] Create docs/data_module.rst explaining the tushare/data/ submodule structure and available functions
[ ] Add example scripts in a new docs/examples/ directory showing common use cases (e.g., fetching stock data, computing technical indicators)
[ ] Update docs/index.rst to include links to the new Pro API and data module documentation

🌿Good first issues

Add missing unit tests for classifying.rst API endpoints in test/classifying_test.py—currently only test/classifying_test.py exists but may have incomplete coverage for all functions documented in docs/classifying.rst.
Document the MongoDB and MySQL schema structure referenced in docs/storing.rst with actual CREATE TABLE/collection examples and sample queries—currently links to docs but schema is not detailed.
Add Python 3.6+ type hints to core functions in tushare/init.py (e.g., def get_hist_data(code: str, start: str = None) -> pd.DataFrame:) to improve IDE autocomplete and reduce user errors.

⭐Top contributors

Click to expand

@jimmysoa — 62 commits
@xiaoluffy — 10 commits
@yutiansut — 4 commits
@algony-tony — 3 commits
@TsingJyujing — 3 commits

📝Recent commits