thu-ml/tianshou

Item: thu-ml/tianshou
Rating: 5
Author: RepoPilot

An elegant PyTorch deep reinforcement learning library.

Healthy

Healthy across all four use cases

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 5w ago
✓4 active contributors
✓MIT licensed

Show all 7 evidence items →

✓CI configured
✓Tests present
⚠Small team — 4 contributors active in recent commits
⚠Concentrated ownership — top contributor handles 55% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/thu-ml/tianshou)](https://repopilot.app/r/thu-ml/tianshou)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/thu-ml/tianshou on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: thu-ml/tianshou

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/thu-ml/tianshou shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

Last commit 5w ago
4 active contributors
MIT licensed
CI configured
Tests present
⚠ Small team — 4 contributors active in recent commits
⚠ Concentrated ownership — top contributor handles 55% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live thu-ml/tianshou repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/thu-ml/tianshou.

What it runs against: a local clone of thu-ml/tianshou — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in thu-ml/tianshou | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 63 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>thu-ml/tianshou</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of thu-ml/tianshou. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/thu-ml/tianshou.git
#   cd tianshou
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of thu-ml/tianshou and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "thu-ml/tianshou(\\.git)?\\b" \\
  && ok "origin remote is thu-ml/tianshou" \\
  || miss "origin remote is not thu-ml/tianshou (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "tianshou/__init__.py" \\
  && ok "tianshou/__init__.py" \\
  || miss "missing critical file: tianshou/__init__.py"
test -f "tianshou/policy/base.py" \\
  && ok "tianshou/policy/base.py" \\
  || miss "missing critical file: tianshou/policy/base.py"
test -f "tianshou/data/buffer/base.py" \\
  && ok "tianshou/data/buffer/base.py" \\
  || miss "missing critical file: tianshou/data/buffer/base.py"
test -f "tianshou/collector/base.py" \\
  && ok "tianshou/collector/base.py" \\
  || miss "missing critical file: tianshou/collector/base.py"
test -f "tianshou/trainer/base.py" \\
  && ok "tianshou/trainer/base.py" \\
  || miss "missing critical file: tianshou/trainer/base.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 63 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~33d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/thu-ml/tianshou"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Tianshou is a production-grade PyTorch-based deep reinforcement learning library that provides modular, type-safe abstractions for implementing RL algorithms (DQN, PPO, TRPO, SAC, etc.) with clean separation between Algorithm and Policy classes. It supports on-policy, off-policy, offline, and experimental multi-agent RL workflows with Gymnasium environments, optimized for both research flexibility and practical application. Monolithic package structure rooted at tianshou/ with core abstractions (Algorithm, Policy, Batch via docs/02_deep_dives/L1_Batch.ipynb), environment handling (L3_Environments.ipynb), data collection (Collector in L5_Collector.ipynb), and algorithm implementations organized by learning paradigm. Extensive Jupyter-based documentation (docs/02_deep_dives/*.ipynb) explains each component's design. GitHub workflows auto-lint, test, and publish to PyPI.

👥Who it's for

RL researchers and engineers building custom reinforcement learning agents who need well-architected, hackable low-level algorithm implementations (e.g., implementing a novel policy gradient variant) as well as practitioners applying existing algorithms to custom environments without framework boilerplate. Also academia/industry teams doing MARL or offline RL research.

🌱Maturity & risk

Actively developed and production-ready: v2.0+ with complete API overhaul (released on PyPI), strong GitHub presence (indicated by stars/forks badges), extensive CI/CD via GitHub Actions (.github/workflows/pytest.yml, lint_and_docs.yml, gputest.yml), comprehensive docs in ReadTheDocs format, and official Docker support. Version 2 represents a mature rearchitecture with clear design principles, not experimental code.

Breaking changes introduced in v2 (not backwards compatible per CHANGELOG.md) require migration for existing users. Dependency on Gymnasium ecosystem (Farama-Foundation/Gymnasium) introduces external maintenance risk. Large Python codebase (1.48M lines) with broad scope (on/off-policy/offline/MARL/model-based) means high surface area for bugs; MARL and model-based features explicitly marked as experimental. No obvious single-maintainer bottleneck visible, but multi-algorithm scope requires sustained maintenance.

Active areas of work

Active development on v2.x with focus on API stability and comprehensive documentation. ReadTheDocs integration active (.readthedocs.yaml present), pre-commit hooks configured (.pre-commit-config.yaml), and multi-stage CI testing (pytest, GPU tests, linting). CHANGELOG.md and recent PR template suggest regular releases and community contributions being triaged.

🚀Get running

git clone https://github.com/thu-ml/tianshou.git
cd tianshou
pip install -e .

Then validate: python -c 'import tianshou; print(tianshou.__version__)'. For GPU testing, use the Dockerfile: docker build -t tianshou . && docker run -it tianshou.

Daily commands: No traditional 'dev server'—this is a library. To validate installation: python -c 'import tianshou'. To run examples: check benchmark/run_benchmark.py for reference training loops. To run tests: pytest (pytest.yml workflow shows this is the canonical test command). For Docker: docker build -t tianshou . && docker run tianshou pytest.

🗺️Map of the codebase

tianshou/__init__.py — Main package entry point defining the public API and core module exports for the entire DRL library
tianshou/policy/base.py — Base policy class that all reinforcement learning algorithms inherit from; foundational abstraction for training agents
tianshou/data/buffer/base.py — Core replay buffer abstraction managing experience storage and sampling, critical for all training pipelines
tianshou/collector/base.py — Environment interaction collector that orchestrates episode rollout and data collection from environments
tianshou/trainer/base.py — Training loop orchestrator coordinating collector, policy updates, and episode evaluation
tianshou/data/batch.py — Unified data structure wrapping environment transitions and batched experience; used throughout the codebase

🧩Components & responsibilities

Policy (BasePolicy) (PyTorch, neural networks) — Converts observations to actions; learns from experience via gradient descent
- Failure mode: Divergent training, numerical instability, NaN losses due to poor learning rates or reward scaling
Collector (Gym, NumPy, vectorization) — Steps environments, applies policy actions, and accumulates transitions into replay buffer
- Failure mode: Environment deadlocks, seed desynchronization across parallel envs, episode boundary corruption
Replay Buffer (NumPy arrays, circular buffers) — Stores and samples experience transitions to break correlations in training data
- Failure mode: Memory exhaustion with large buffer sizes, stale data if not properly rotated, slow sampling with prioritization
Trainer (Python async, PyTorch optimizers) — Orchestrates collection → learning → evaluation loop and manages training state
- Failure mode: Training hangs if collector blocks, incorrect learning rate schedules causing divergence

🛠️How to make changes

Add a new DRL algorithm

Create new policy class inheriting from BasePolicy in tianshou/policy/ (tianshou/policy/your_algorithm.py)
Implement forward(batch, state=None, ...) for inference and learn(batch, **kwargs) for training (tianshou/policy/your_algorithm.py)
Register algorithm in tianshou/policy/init.py (tianshou/policy/__init__.py)
Create example script demonstrating training with your algorithm (examples/atari/atari_youralgo.py or examples/box2d/environment_youralgo.py)

Add a new replay buffer type

Create buffer class inheriting from BaseBuffer in tianshou/data/buffer/ (tianshou/data/buffer/your_buffer.py)
Implement add() for storing transitions and sample() for retrieval (tianshou/data/buffer/your_buffer.py)
Export new buffer in tianshou/data/buffer/init.py (tianshou/data/buffer/__init__.py)

Add environment wrappers for a new domain

Create wrapper class in tianshou/env/ extending gym.Wrapper (tianshou/env/your_domain_env.py)
Implement step() and reset() to standardize observations and rewards (tianshou/env/your_domain_env.py)
Create example training script using the wrapper (examples/your_domain/environment_algorithm.py)

🔧Why these technologies

PyTorch — Primary deep learning framework for neural network policies; enables GPU acceleration and automatic differentiation
OpenAI Gym API — Standard environment interface allowing seamless integration with diverse RL benchmarks (Atari, MuJoCo, Box2D)
NumPy/PyTorch tensors — Efficient vectorized data structures for batch processing experience transitions and parallel environment collection
TensorBoard/W&B — Experiment tracking and visualization of training metrics across distributed runs

⚖️Trade-offs already made

Unified Batch abstraction wrapping all data types
- Why: Provides consistent interface across on-policy, off-policy, and model-based algorithms
- Consequence: Slight overhead in serialization but massive flexibility in algorithm implementations and extensibility
Vectorized environment collection in parallel
- Why: Dramatically improves sample efficiency and reduces wall-clock training time
- Consequence: Added complexity in state management and synchronization; requires careful handling of episode boundaries across parallel envs
Separate Trainer and Collector responsibilities
- Why: Clean separation allowing independent reuse (e.g., evaluation without training, multi-policy coordination)
- Consequence: More moving parts to coordinate; requires passing configuration through multiple layers
**Flexible policy forward() signature using kwargs
- Why: Allows algorithms to use heterogeneous observations (images, vectors, recurrent states) without subclassing
- Consequence: Reduced type safety; developers must understand optional parameters for each algorithm

🚫Non-goals (don't propose these)

Does not provide built-in distributed training across multiple machines (single-node multi-GPU supported via PyTorch)
Does not implement robot-specific simulation environments (delegates to third-party simulators like MuJoCo, Pybullet)
Does not handle real-world robotics deployment or sim-to-real transfer
Does not include pre-trained model checkpoints (examples show training from scratch)

🪤Traps & gotchas

Version 2 is not backwards compatible (CHANGELOG.md emphasizes this)—existing code targeting v1 will break. Algorithm/Policy split is new abstraction; old code that mixed these will need refactoring. Gymnasium dependency (not OpenAI Gym) is hard requirement; gym code won't work without adapters. MARL and model-based features are experimental (docs note this), so their APIs may change. Docker builds for GPU tests (.github/workflows/gputest.yml) suggest CUDA version pinning—verify your GPU environment matches Dockerfile. ReadTheDocs builds from .readthedocs.yaml; docs require specific build dependencies.

🏗️Architecture

💡Concepts to learn

Markov Decision Process (MDP) — Foundational RL formalism that every Tianshou algorithm solves; understanding state transitions, rewards, and policies is prerequisite to implementing any new algorithm
Experience Replay / Replay Buffer — Central mechanism in Tianshou for breaking correlations between samples; the tianshou/data/buffer/ implementations are critical to algorithm stability, especially in off-policy learning
Policy Gradient Methods — Family of algorithms (PPO, TRPO, A3C, SAC) heavily implemented in Tianshou; core to understanding on-policy learning and entropy-regularized RL
Batch (vectorized transition groups) — Tianshou's custom Batch data structure (tianshou/data/batch.py) abstracts trajectory storage and enables efficient vectorized RL across environments; foundational to the library's speed and design
Off-policy vs On-policy Learning — Tianshou cleanly separates these learning paradigms at the Algorithm class level; understanding the distinction (behavior policy vs. target policy, importance sampling) is essential for choosing and extending algorithms
Generalized Advantage Estimation (GAE) — Tianshou has a dedicated deep-dive notebook (docs/02_deep_dives/L4_GAE.ipynb) and GAE is used in most on-policy algorithms (PPO, A3C); critical for variance reduction in policy gradients
Multi-Agent RL (MARL) — Tianshou experimental support (L6_MARL.ipynb) extends single-agent algorithms to cooperative/competitive settings; requires understanding of independent learners, centralized training/decentralized execution (CTDE), and multi-agent value functions

openai/spinningup — Educational RL library (now deprecated) that inspired clean algorithm documentation; Tianshou adopted similar pedagogical separation of concerns but in production code
DLR-RM/stable-baselines3 — Alternative PyTorch RL library with similar scope (on/off-policy algorithms); direct competitor with different API philosophy (SB3 emphasizes simplicity, Tianshou emphasizes modularity)
Farama-Foundation/Gymnasium — The environment interface standard Tianshou builds on; any custom RL environment must implement Gymnasium API to work with Tianshou collectors
pytorch/pytorch — Core dependency; Tianshou is pure PyTorch, so understanding PyTorch tensor ops and autograd is prerequisite knowledge
thu-ml/tianshou-models — Companion repo (if it exists) with pre-trained model checkpoints and benchmark results for Tianshou algorithms across Gymnasium environments

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add missing deep dive documentation for Policy and Trainer abstractions

The docs/02_deep_dives folder contains L1-L6 notebooks covering Batch, Buffer, Environments, GAE, Collector, and MARL, but conspicuously missing are deep dives on Policy (core to RL) and Trainer (the main entry point). Given this is an 'elegant PyTorch deep RL library', new contributors should have accessible tutorials on these critical abstractions. This would improve onboarding and reduce GitHub issues about policy implementation.

[ ] Create docs/02_deep_dives/L0_Policy.ipynb covering policy architecture, different policy types (DQN, PG, Actor-Critic), and the policy interface
[ ] Create docs/02_deep_dives/L7_Trainer.ipynb covering training loops, the Trainer class, and integration with Collector/Policy/Buffer
[ ] Link these notebooks from docs/02_deep_dives/0_intro.md and update docs/01_user_guide/02_core_abstractions.md with references
[ ] Add example code snippets showing minimal policy and trainer usage

Add comprehensive integration tests for multi-agent RL (MARL) workflows

The repo has L6_MARL.ipynb documentation and MARL support (evident from file structure), but there's no indication of dedicated integration tests verifying end-to-end MARL training workflows. The pytest.yml workflow exists but likely lacks MARL-specific test coverage. This is critical for a library claiming elegant MARL support, as distributed multi-agent scenarios are complex and error-prone.

[ ] Create tests/test_marl_integration.py with end-to-end tests covering: multi-agent environment setup, policy-per-agent training, shared policy training, and communication patterns
[ ] Add a simple MARL environment fixture (e.g., using PettingZoo) in tests/fixtures/environments.py
[ ] Update .github/workflows/pytest.yml to explicitly run MARL tests with a dedicated job or marker
[ ] Document MARL testing requirements in CONTRIBUTING.md (e.g., PettingZoo installation)

Add type hints and type checking (mypy) to CI workflow

The repo has .github/workflows/lint_and_docs.yml but there's no evidence of type checking (mypy) in the linting pipeline. For a PyTorch library that's already well-structured, adding type hints and mypy checks would improve code quality, reduce bugs, and make the codebase more maintainable. This is a high-value contribution that affects all future PRs.

[ ] Add mypy configuration to pyproject.toml or setup.cfg with strict settings
[ ] Create .github/workflows/mypy_check.yml (or extend lint_and_docs.yml) to run 'mypy tianshou' on each PR
[ ] Incrementally add type hints to core modules (start with tianshou/policy/ and tianshou/trainer/) across multiple PRs
[ ] Document type hint requirements in CONTRIBUTING.md for new contributors

🌿Good first issues

Add comprehensive docstring examples to all Policy subclasses in tianshou/policy/: currently many have sparse documentation, and adding runnable code snippets (like batch shape contracts and usage patterns) would help new contributors understand the Policy abstraction.
Create integration tests in tests/ for Collector with vectorized Gymnasium environments: the async.png and collector documentation suggest this is important but dedicated test coverage of parallel sampling edge cases (episode termination, batch padding, reset semantics) appears minimal.
Expand docs/02_deep_dives/L2_Buffer.ipynb with concrete examples of custom replay buffer implementations: the buffer abstraction is powerful but existing notebook only covers built-in buffers; adding 'build your own prioritized buffer' walkthrough would unblock algorithm researchers.

⭐Top contributors

Click to expand

@opcode81 — 55 commits
@MischaPanch — 43 commits
@Trinkle23897 — 1 commits
@Mr-Neutr0n — 1 commits

📝Recent commits

Click to expand

f240205 — [data collector] Use monotonic clocks for collector timing (#1295) (Trinkle23897)
bf5b636 — Fix link, improve wording [skip ci] (opcode81)
1d57967 — Release v2.0.1 (opcode81)
8da023e — Constrain pandas to <3 owing to incompatibility (#1290) (opcode81)
3d73f8f — Remove conda badge from README [skip ci] (opcode81)
162473e — Constrain pandas to <3 owing to incompatibility (opcode81)
ddf6083 — fix: restore parameters on TRPO line search failure (#1287) (opcode81)
af0d191 — fix: restore original parameters when TRPO line search fails (Mr-Neutr0n)
414589d — Disable fail-fast for matrix build (opcode81)
be90aad — Fix link to developer guide [skip ci] (opcode81)

🔒Security observations

The Tianshou codebase shows a moderate security posture with a critical issue requiring immediate attention: the incomplete Dockerfile COPY command. Additionally, lack of version pinning for system and Python packages introduces maintenance and security risks. The project lacks a non-root user configuration and comprehensive dependency vulnerability scanning. The CI/CD pipeline (visible through workflows) appears to include basic testing, but could be enhanced with security scanning tools. No hardcoded secrets were detected in the analyzed file structure. Overall, the project would benefit from stricter Docker security practices and dependency management policies.

High · Incomplete Dockerfile COPY Command — Dockerfile. The Dockerfile contains a truncated COPY instruction: 'COPY pyproject.tom' which appears incomplete. This could indicate a malformed build configuration that may fail during image creation or copy incorrect files. The complete filename should be 'pyproject.toml'. Fix: Complete the COPY command to 'COPY pyproject.toml poetry.lock* .' to properly copy project dependencies configuration files into the image.
Medium · Unrestricted System Package Installation — Dockerfile - apt-get install command. The Dockerfile installs system packages without pinning specific versions (curl, build-essential, git, wget, unzip, libvips-dev, gnupg2). This can lead to inconsistent builds and potential security issues if vulnerable versions are installed in the future. Fix: Pin specific versions for all system packages to ensure reproducible and secure builds. Example: 'curl=7.x.x' instead of just 'curl'.
Medium · Unrestricted Python Package Installation — Dockerfile - pipx install poetry. The Dockerfile installs 'poetry' and 'pipx' without version pinning. This could result in installing vulnerable versions or versions with breaking changes. Fix: Specify exact versions for pipx and poetry installations, e.g., 'pipx install poetry==1.x.x' to ensure reproducible builds.
Low · Missing HEALTHCHECK in Dockerfile — Dockerfile. The Dockerfile does not include a HEALTHCHECK instruction, making it difficult to monitor container health in production environments. Fix: Add a HEALTHCHECK instruction to the Dockerfile to enable automatic health monitoring of the container.
Low · Running Container as Root — Dockerfile. The Dockerfile does not specify a non-root user. The container will run as root by default, which violates the principle of least privilege and increases the attack surface. Fix: Create a non-root user and switch to it before running the application. Example: 'RUN useradd -m tianshou && USER tianshou'.
Low · Missing SBOM and Supply Chain Security Metadata — Dockerfile and CI/CD workflows. No evidence of software bill of materials (SBOM) generation or dependency vulnerability scanning in the build pipeline. This limits supply chain security visibility. Fix: Integrate tools like Syft or Trivy into the build pipeline to generate SBOMs and scan for known vulnerabilities in dependencies.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

thu-ml/tianshou

Embed the "Healthy" badge

Onboarding doc

Onboarding: thu-ml/tianshou

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🧩Components & responsibilities

🛠️How to make changes

Add a new DRL algorithm

Add a new replay buffer type

Add environment wrappers for a new domain

🔧Why these technologies

⚖️Trade-offs already made

🚫Non-goals (don't propose these)

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add missing deep dive documentation for Policy and Trainer abstractions

Add comprehensive integration tests for multi-agent RL (MARL) workflows

Add type hints and type checking (mypy) to CI workflow

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next