waditu/tushare
TuShare is a utility for crawling historical data of China stocks
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓20 active contributors
- ✓BSD-3-Clause licensed
- ✓CI configured
- ✓Tests present
- ⚠Stale — last commit 2y ago
- ⚠Concentrated ownership — top contributor handles 62% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/waditu/tushare)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/waditu/tushare on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: waditu/tushare
Generated by RepoPilot · 2026-05-07 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/waditu/tushare shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- 20 active contributors
- BSD-3-Clause licensed
- CI configured
- Tests present
- ⚠ Stale — last commit 2y ago
- ⚠ Concentrated ownership — top contributor handles 62% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live waditu/tushare
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/waditu/tushare.
What it runs against: a local clone of waditu/tushare — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in waditu/tushare | Confirms the artifact applies here, not a fork |
| 2 | License is still BSD-3-Clause | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 815 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of waditu/tushare. If you don't
# have one yet, run these first:
#
# git clone https://github.com/waditu/tushare.git
# cd tushare
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of waditu/tushare and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "waditu/tushare(\\.git)?\\b" \\
&& ok "origin remote is waditu/tushare" \\
|| miss "origin remote is not waditu/tushare (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(BSD-3-Clause)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"BSD-3-Clause\"" package.json 2>/dev/null) \\
&& ok "license is BSD-3-Clause" \\
|| miss "license drift — was BSD-3-Clause at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "tushare/__init__.py" \\
&& ok "tushare/__init__.py" \\
|| miss "missing critical file: tushare/__init__.py"
test -f "tushare/stock/trading.py" \\
&& ok "tushare/stock/trading.py" \\
|| miss "missing critical file: tushare/stock/trading.py"
test -f "tushare/pro/client.py" \\
&& ok "tushare/pro/client.py" \\
|| miss "missing critical file: tushare/pro/client.py"
test -f "tushare/stock/cons.py" \\
&& ok "tushare/stock/cons.py" \\
|| miss "missing critical file: tushare/stock/cons.py"
test -f "setup.py" \\
&& ok "setup.py" \\
|| miss "missing critical file: setup.py"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 815 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~785d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/waditu/tushare"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
TuShare is a Python library for scraping, cleaning, and storing historical financial data for Chinese stocks and futures. It provides simple function calls to fetch OHLCV (Open, High, Low, Close, Volume) data, technical indicators, and fundamental information from Chinese exchanges, with built-in support for storing data in MongoDB or MySQL. Monolithic structure: core modules are at root level (tushare package), documentation lives in docs/ with per-feature .rst files (trading.rst, fundamental.rst, macro.rst, etc.), tests are in test/ directory with individual test files per feature (bar_test.py, classifying_test.py), and setup.py handles packaging.
👥Who it's for
Quantitative analysts and financial data engineers in China who need to backtest trading strategies or perform technical analysis on Chinese equities without building custom data pipelines. Also used by students learning financial data analysis.
🌱Maturity & risk
The project is in maintenance mode with active development shifted to a Pro version (tushare.pro). The codebase shows 466k lines of Python, has CI setup via Travis CI (.travis.yml present), and includes test files (test/ directory), but the README explicitly directs users to the new Pro version, indicating this legacy version is stable but not the focus of new feature development.
Single maintainer risk (waditu) is evident from the repository name pattern. The library depends on live data sources (stock exchanges) which can change without notice, breaking API compatibility. No git history is visible in the file list to assess commit recency, and the deprecation message in README indicates the maintainer may deprioritize bug fixes for the free version.
Active areas of work
The main development has moved to TuShare Pro (mentioned prominently in README pointing to tushare.pro). The legacy free version appears to be in maintenance: specific focus on new feature development is unclear from file timestamps, but the codebase is stable enough to recommend in the README despite the migration notice.
🚀Get running
git clone https://github.com/waditu/tushare.git && cd tushare && pip install -r requirements.txt && python setup.py install
Daily commands:
This is a library, not a service. Import and call functions: python -c "import tushare as ts; print(ts.get_hist_data('600848'))" after installation.
🗺️Map of the codebase
tushare/__init__.py— Main entry point that exposes the public API and imports all submodules; essential for understanding what functionality is available.tushare/stock/trading.py— Core module for stock trading data retrieval; handles historical data fetching which is the primary use case shown in README.tushare/pro/client.py— Pro API client implementation; represents the newer API architecture that users are being migrated toward.tushare/stock/cons.py— Constants and configuration for stock module; defines URLs, parameters, and shared configuration across stock functions.setup.py— Package configuration and dependency declaration; critical for understanding installation and version management.tushare/stock/globals.py— Global state and session management for HTTP requests; affects all network operations in the codebase.
🧩Components & responsibilities
- Stock Module (tushare/stock/) (BeautifulSoup, pandas, requests) — Historical OHLCV, fundamentals, macro indicators, technical analysis, news, reference data for equities
- Failure mode: Website layout changes break HTML scraping; returns malformed or missing columns; network timeouts
- Futures Module (tushare/futures/) (BeautifulSoup, pandas, requests) — Domestic (DCE, CZCE, SHFE) and international futures data including contracts, positions, and quotes
- Failure mode: Contract list changes; continuous contract logic breaks; data source migrations
- Fund Module (tushare/fund/) (BeautifulSoup, pandas,) — Mutual fund NAV, holdings, performance metrics
🛠️How to make changes
Add a new stock data function
- Create function in appropriate stock submodule (e.g., trading.py, fundamental.py) that fetches and parses data (
tushare/stock/trading.py) - Define API endpoint and parameters in module's cons.py or the function itself (
tushare/stock/cons.py) - Parse HTML/JSON response using BeautifulSoup or simplejson, return pandas DataFrame (
tushare/stock/trading.py) - Import and re-export function in tushare/stock/init.py (
tushare/stock/__init__.py) - Re-export in main tushare/init.py for top-level access (
tushare/__init__.py) - Add unit tests in test/trading_test.py or appropriate test file (
test/trading_test.py)
Add support for a new asset class (e.g., new market data type)
- Create new directory under tushare/ (e.g., tushare/options/) (
tushare/) - Create init.py to export public functions (
tushare/options/__init__.py) - Create cons.py with API endpoints and constants (
tushare/options/cons.py) - Create data retrieval module with fetch and parse logic (
tushare/options/data.py) - Import asset class in main tushare/init.py (
tushare/__init__.py)
Migrate a function to Pro API
- Implement new function in tushare/pro/data_pro.py using client.py's query method (
tushare/pro/data_pro.py) - Function should accept token parameter and call client.query(api_name, params) (
tushare/pro/client.py) - Return pandas DataFrame with same schema as legacy version for backward compatibility (
tushare/pro/data_pro.py) - Export from tushare/pro/init.py (
tushare/pro/__init__.py) - Add integration tests calling Pro API with test token (
test/trading_test.py)
🔧Why these technologies
- Python 2.x/3.x compatibility — Maximizes user base in emerging markets and legacy systems; TuShare targets financial analysts who may use various Python versions
- pandas DataFrame as return type — De facto standard for financial data manipulation in Python; integrates seamlessly with quantitative analysis workflows
- BeautifulSoup for HTML parsing — Flexible scraping when official APIs are unavailable; allows extraction from rendered pages without Selenium
- requests library for HTTP — Lightweight, widely-adopted, minimal dependencies; sufficient for simple GET requests without async needs
- Modular organization by asset class — Reduces coupling and allows independent updates; users import only needed modules (tushare.stock, tushare.futures, etc.)
⚖️Trade-offs already made
-
Web scraping as primary data source instead of dedicated APIs
- Why: No centralized Chinese stock data API available at project inception; scraping enables broad data coverage
- Consequence: Fragile to website layout changes; requires frequent maintenance; slower than native APIs; potential legal/ToS issues
-
Legacy API (v1) and Pro API (v2) running in parallel
- Why: Pro API uses token-based authentication and better infrastructure; legacy maintains backward compatibility
- Consequence: Code duplication; maintenance burden; confusing for new users; unclear migration path
-
Synchronous blocking I/O with requests
- Why: Simple, synchronous code easier to understand for financial analysts unfamiliar with async
- Consequence: Slow for batch operations; cannot fetch multiple securities concurrently; poor scalability for high-frequency workflows
-
No built-in caching or rate limiting
- Why: Simplicity; delegates responsibility to user; avoids stale data issues
- Consequence: Users may inadvertently DoS data source; repeated calls refetch unchanged data; poor performance for large backtests
🚫Non-goals (don't propose these)
- Real-time streaming data or tick-by-tick quotes
- Portfolio management or trade execution
- Authentication or user account management
- Cross-asset class correlation analysis
- Data storage (MongoDB/MySQL integration is referenced but not implemented in core)
- Intraday or sub-minute granularity for most datasets
🪤Traps & gotchas
The library scrapes live Chinese stock exchange websites, so function behavior can break silently if those sites change their HTML structure without warning (lxml/BeautifulSoup is fragile to layout changes). The README mentions both free version and paid 'Pro' version—ensure you're looking at correct API docs for which version you're using. Python 2 support is mentioned but likely abandoned in practice. No explicit rate limiting visible in the structure, so aggressive scraping may trigger IP blocks from target websites.
🏗️Architecture
💡Concepts to learn
- [Web Scraping (lxml/BeautifulSoup4)](https://docs.python-requests.org/en/master/ and https://www.crummy.com/software/BeautifulSoup/bs4/doc/) — TuShare fetches stock data by parsing HTML from Chinese exchange websites, making fragile to layout changes; understanding CSS selectors and DOM traversal is critical for debugging broken data sources.
- Pandas DataFrame Normalization — All data returned from TuShare functions are pandas DataFrames with specific column orders and types (date as index, OHLCV as float); users must understand index manipulation and column selection to avoid errors.
- Technical Indicators (Moving Averages, Turnover) — TuShare returns pre-calculated MA5, MA10, MA20 (moving averages) and turnover ratio in the same DataFrame; users need domain knowledge to validate correctness and understand when to use which indicator.
- Time Series Indexing — Stock data is date-indexed (shown in README examples with 'date' as the DataFrame index), requiring time-based slicing like
ts.get_hist_data('600848', start='2015-01-05', end='2015-01-09'); Pandas time indexing is non-obvious. - Chinese Stock Code Convention (6-digit ticker) — TuShare uses 6-digit Chinese stock codes (600848 in examples) which differ from Western ticker conventions; users must understand Shanghai Stock Exchange (600xxx) vs Shenzhen Stock Exchange (000xxx, 300xxx) prefixes.
- Rate Limiting & IP Blocking Risk — The library scrapes live sources without explicit rate limiting visible in the code; users running batch operations may trigger IP blocks from exchanges, requiring exponential backoff or rotating proxies—not handled by the library.
- Python 2 vs 3 Compatibility — TuShare targets both Python 2.x and 3.x (mentioned in README), meaning code uses compatible patterns (unicode handling, print functions); maintainers should be aware this constraint is likely abandoned in practice.
🔗Related repos
akshare/akshare— Modern alternative for Chinese financial data fetching with more active development and support for newer Chinese data sources like Sina and Tencent.twopiraman/Stock-Data-Visualization-by-Pandas-Matplotlib— Example repository showing how to use financial data libraries like TuShare to build analysis pipelines and visualizations.jealous/stockstats— Companion library for calculating technical indicators (MA, RSI, MACD) on top of OHLCV data fetched by TuShare.pandas-dev/pandas— Core dependency for data manipulation in TuShare; understanding DataFrame operations is essential for using this library effectively.ccxt/ccxt— Similar architecture for multi-exchange data aggregation but focuses on cryptocurrency exchanges instead of Chinese stocks.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for tushare/bond/bonds.py and tushare/coins/market.py modules
The test directory has tests for most core modules (trading, macro, news, etc.), but there are no corresponding test files for the bond and coins submodules. These financial data modules need dedicated unit tests to ensure data retrieval and parsing work correctly. This improves code reliability and makes it easier for contributors to modify these modules safely.
- [ ] Create test/bond_test.py with unit tests for tushare/bond/bonds.py covering all public functions
- [ ] Create test/coins_test.py with unit tests for tushare/coins/market.py covering cryptocurrency market data retrieval
- [ ] Add mock HTTP responses for external API calls to avoid network dependencies during testing
- [ ] Update test_unittest.py to include the new test modules in the test suite
Migrate from .travis.yml to GitHub Actions workflow for Python 2.x/3.x testing
The repo uses Travis CI (.travis.yml) for legacy CI, but GitHub Actions is now the standard. Additionally, Python 2 reached end-of-life in 2020. A modern GitHub Actions workflow should test Python 3.7+ across multiple versions, run the test suite (test/), and provide faster feedback. This removes outdated CI infrastructure and ensures the project works with current Python versions.
- [ ] Create .github/workflows/python-tests.yml with matrix testing for Python 3.7, 3.8, 3.9, 3.10, 3.11
- [ ] Include steps to install dependencies from requirements.txt and run pytest on test/ directory
- [ ] Add linting step (pylint or flake8) to catch code quality issues
- [ ] Remove or deprecate .travis.yml with a note in README about the migration
Add integration documentation and examples for tushare/data module and new Pro API
The README mentions 'TuShare Pro版已发布' (TuShare Pro version released) and directs users to https://tushare.pro, but the repo still has minimal documentation for the actual tushare/data module structure and no examples for Pro API usage. The docs/ folder has .rst files for specific features but lacks comprehensive API reference documentation. Adding clear examples and Pro API documentation will help users migrate from the old API.
- [ ] Create docs/pro_api.rst documenting the new Pro API endpoints with code examples
- [ ] Create docs/data_module.rst explaining the tushare/data/ submodule structure and available functions
- [ ] Add example scripts in a new docs/examples/ directory showing common use cases (e.g., fetching stock data, computing technical indicators)
- [ ] Update docs/index.rst to include links to the new Pro API and data module documentation
🌿Good first issues
- Add missing unit tests for classifying.rst API endpoints in test/classifying_test.py—currently only test/classifying_test.py exists but may have incomplete coverage for all functions documented in docs/classifying.rst.
- Document the MongoDB and MySQL schema structure referenced in docs/storing.rst with actual CREATE TABLE/collection examples and sample queries—currently links to docs but schema is not detailed.
- Add Python 3.6+ type hints to core functions in tushare/init.py (e.g.,
def get_hist_data(code: str, start: str = None) -> pd.DataFrame:) to improve IDE autocomplete and reduce user errors.
⭐Top contributors
Click to expand
- @jimmysoa — 62 commits
- @xiaoluffy — 10 commits
- @yutiansut — 4 commits
- @algony-tony — 3 commits
- @TsingJyujing — 3 commits
📝Recent commits
Click to expand
0938569— Merge pull request #1305 from Astraeux/feature_us_stock (jimmysoa)425c8d5— Add feature: get_realtime_quotes support US stock (Astraeux)c80066d— Merge pull request #1 from waditu/master (Astraeux)3548065— Merge pull request #1299 from zweite/fix/readme_changelog (jimmysoa)e4f1236— Update README.md (zweite)d897221— Merge pull request #908 from wangzili/wangzili (jimmysoa)5b70e7f— Add .idea in .gitignore (wangzili)8bb5f2d— 添加多种指标,并且结果与同花顺校验过 (wangzili)88ba192— update for v1.2.18 (jimmysoa)ecae7f7— 1.2.7 commit (jimmysoa)
🔒Security observations
- High · Overly Permissive Dependency Versions —
requirements.txt. The requirements.txt specifies dependencies with loose version constraints (>=). This allows installation of versions with known vulnerabilities. For example, lxml>=3.8.0 is from 2018 and may contain security flaws. beautifulsoup4>=4.5.1 is also outdated. No upper bounds or pinned versions are specified, increasing attack surface. Fix: Pin dependencies to specific tested versions (e.g., beautifulsoup4==4.11.1, lxml==4.9.2, requests==2.28.1). Implement regular dependency scanning and updates. Use tools like pip-audit or safety to identify vulnerable versions. - High · Use of Insecure Parsing Library (BeautifulSoup4 for Web Scraping) —
tushare/internet/caixinnews.py, tushare/internet/boxoffice.py, and other scraping modules. The codebase uses BeautifulSoup4 and lxml for web scraping financial data from external sources. Without proper input validation and sanitization, this creates risks for XXE (XML External Entity) attacks, HTML injection, and data poisoning if scraping untrusted sources. Fix: Validate and sanitize all parsed data. Use defusedxml instead of lxml for XML parsing to prevent XXE attacks. Implement strict input validation for all web-scraped content before processing or storage. - Medium · No HTTPS Enforcement for API Calls —
tushare/pro/client.py, tushare/pro/data_pro.py. The codebase uses 'requests' library for HTTP/API calls to external services (tushare.pro). Without explicitly enforcing HTTPS, requests could be intercepted via MITM attacks, especially when fetching sensitive financial data. Fix: Enforce HTTPS for all remote API calls. Verify SSL certificates. Use requests.Session with verify=True (default). Implement certificate pinning for critical API endpoints. Add URL validation to reject HTTP endpoints. - Medium · Potential SQL Injection in Data Storage Module —
tushare/data/ module and storing functionality. The file structure references 'storing' functionality (docs/storing.rst, test/storing_test.py) with MySQL and MongoDB drivers. If dynamic SQL queries are constructed using user input without parameterized queries, SQL injection is possible. Fix: Use parameterized queries / prepared statements for all database operations. Use ORM frameworks (SQLAlchemy) instead of raw SQL. Implement input validation and whitelist allowed database operations. - Medium · No Input Validation on Stock Symbols and Parameters —
tushare/__init__.py and data retrieval functions. Quick start example shows calling ts.get_hist_data('600848') with user-provided stock symbols. There's no visible validation of stock symbols, date ranges, or other parameters, which could lead to injection attacks or unexpected behavior. Fix: Implement strict input validation for all user-provided parameters (stock symbols, date ranges, limits). Use whitelisting for stock symbols. Validate data types and ranges before using in API calls or queries. - Low · Python 2.x Support Indicates Outdated Codebase —
README.md, setup.py. The README indicates support for Python 2.x, which reached end-of-life in January 2020. Python 2 no longer receives security updates, and many dependencies may have removed Python 2 support. Fix: Drop Python 2.x support. Upgrade to Python 3.7+ and update all dependencies to latest versions. Perform thorough testing on modern Python versions. - Low · Missing Security Headers and API Rate Limiting Information —
tushare/pro/client.py. No visible implementation of rate limiting, request throttling, or API security headers in the client code. This could allow abuse or DDoS attacks against the data provider's infrastructure. Fix: Implement rate limiting and request throttling with exponential backoff. Add appropriate HTTP headers (User-Agent, X-Requested-With). Follow API provider's rate limits and authentication requirements. - Low · Potential Data Exposure through Logging —
undefined. No visible logging configuration or log sanitization visible in the codebase. If financial data (stock symbols, Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.