sjmoran/satire-classifier
A Naive Bayes classifier for satire detection
Stale and unlicensed — last commit 3y ago
no license — legally unclear; last commit was 3y ago…
no license — can't legally use code; no CI workflows detected…
Documented and popular — useful reference codebase to read through.
no license — can't legally use code; last commit was 3y ago…
- ⚠Stale — last commit 3y ago
- ⚠Small team — 2 contributors active in recent commits
- ⚠Concentrated ownership — top contributor handles 67% of recent commits
- ⚠No license — legally unclear to depend on
- ⚠No CI workflows detected
- ⚠Scorecard: marked unmaintained (0/10)
- ⚠Scorecard: default branch unprotected (0/10)
- ✓2 active contributors
- ✓Tests present
What would improve this?
- →Use as dependency Concerns → Mixed if: publish a permissive license (MIT, Apache-2.0, etc.)
- →Fork & modify Concerns → Mixed if: add a LICENSE file
- →Deploy as-is Concerns → Mixed if: add a LICENSE file
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests + OpenSSF Scorecard
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Great to learn from" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/sjmoran/satire-classifier)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card
This card auto-renders when someone shares https://repopilot.app/r/sjmoran/satire-classifier on X, Slack, or LinkedIn.
Ask AI about sjmoran/satire-classifier
Grounded in the actual source code. Pick a starter question or write your own.
Onboarding doc
Onboarding: sjmoran/satire-classifier
Generated by RepoPilot · 2026-06-20 · Source
🎯Verdict
AVOID — Stale and unlicensed — last commit 3y ago
- 2 active contributors
- Tests present
- ⚠ Stale — last commit 3y ago
- ⚠ Small team — 2 contributors active in recent commits
- ⚠ Concentrated ownership — top contributor handles 67% of recent commits
- ⚠ No license — legally unclear to depend on
- ⚠ No CI workflows detected
- ⚠ Scorecard: marked unmaintained (0/10)
- ⚠ Scorecard: default branch unprotected (0/10)
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests + OpenSSF Scorecard</sub>
⚡TL;DR
A binary classification model that detects satirical news articles using Naive Bayes, combining Multinomial NB (for discrete count-based features like unigrams, punctuation, capitalization) and Gaussian NB (for word2vec embeddings), then stacking their probability outputs into a meta-classifier. It achieves 0.96 F-score on 10-fold cross-validation by extracting 6 linguistic feature types: word embeddings, Chi2-filtered unigrams, punctuation/capitalization counts, sentiment polarity, intensifiers, and interjections. Monolithic Python project: the main classifier logic resides in the root directory with a logging.conf config file, requirements.txt for dependencies, and eval.prl (Perl script, likely post-processing evaluation). Test data is organized in a flat test/ directory with 54 numbered test case files (test-0001 through test-0054) and a separate test-class file, suggesting train/validation split or labeling scheme, but no visible src/ or models/ subdirectory indicating model persistence.
👥Who it's for
NLP researchers and practitioners building satirical content detection systems who want a baseline Naive Bayes model that's interpretable, doesn't require deep learning, and extracts hand-crafted linguistic signals from text. Contributors interested in feature engineering for text classification or satire detection benchmarking.
🌱Maturity & risk
Experimental / research-stage. Version 1.0.0 is published with an MIT license and includes 54 numbered test cases (test/test-0001 through test/test-0054), but there is no visible CI pipeline, git history, or dates in the provided metadata. The codebase is small (~48KB Python, ~5KB Perl) with minimal dependencies management (requirements.txt present but no lock file, Perl eval script present but unclear in scope). Verdict: suitable for academic reference and experiments, not production-ready without hardening.
Moderate risk: dependencies are pinned to circa-2017 versions (scikit-learn 0.19.0, gensim 3.0.1, nltk 3.2.5) and likely incompatible with Python 3.8+; no requirements.lock or setup.py strict pinning; single-maintainer (sjmoran) with unknown commit recency; 54 test files named test/test-NNNN suggest test data, not automated test suites, so test coverage and CI/CD are unclear. No visible issues/PRs in metadata provided, making active development status unknown.
Active areas of work
No specific activity visible in the provided metadata. The repo structure and file list alone do not indicate recent PRs, open issues, or active development milestones. Version 1.0.0 is finalized with no indication of 1.1.0 or active branches.
🚀Get running
Clone the repository, create a Python 2.7 or early 3.x environment (given 2017-era dependencies), and install dependencies:
git clone https://github.com/sjmoran/satire-classifier.git
cd satire-classifier
pip install -r requirements.txt
Then consult README.md for the expected entry point (likely a Python script not listed in the top 60 files, or infer from eval.prl).
Daily commands:
No Makefile, setup.py, or explicit entry point in the file list. Infer from naming: likely python eval.prl (via subprocess), or a hidden main.py / classify.py file. Check the README or ls -la for *.py files with main/train/classify in the name. Typical usage:
python <main_script>.py --train path/to/train --test path/to/test
Note: eval.prl is Perl, suggesting evaluation is post-processing in Perl; Python likely trains and outputs predictions.
🗺️Map of the codebase
README.md— Entry point documenting the satire detection model architecture, feature types, and Naive Bayes approach that defines the entire project scope.requirements.txt— Declares all dependencies including scikit-learn, nltk, gensim, and pandas that are essential for the classifier pipeline.logging.conf— Configures logging across the codebase for debugging and monitoring classifier training and evaluation.eval.prl— Primary evaluation script that assesses model performance on test data and produces metrics.test-class— Core test execution file that validates classifier predictions against ground truth labels.
🧩Components & responsibilities
- Feature Extractor (NLTK + gensim) (NLTK, gensim, pandas) — Transforms raw documents into six feature types: embeddings, unigrams, punctuation, capitalization, sentiment, intensifiers/interjections
- Failure mode: Missing or incorrectly extracted features reduce classifier accuracy; NLTK data dependencies could cause crashes if not downloaded
- Naive Bayes Classifier (scikit-learn) (scikit-learn, numpy) — Learns conditional probability distributions from training data; produces satire/non-satire predictions with confidence scores
- Failure mode: Overfitting on small training sets; poor generalization to out-of-domain satire sources; feature scaling mismatches (especially embeddings vs counts)
- Evaluator (eval.prl) (scikit-learn, pandas) — Benchmarks trained classifier on held-out test set; computes standard metrics (accuracy, precision, recall, F1)
- Failure mode: Test set imbalance obscuring true performance; metric choice may not reflect real-world satire detection needs
- Test Suite (test-class, test/test-XXXX) (Python test runner) — Validates classifier predictions against 114 manually labeled satire/non-satire documents
- Failure mode: Human annotation errors in test labels; limited test coverage may miss edge cases or adversarial examples
🔀Data flow
Raw document text→Feature Extractor— Document fed into pipeline for tokenization and six parallel feature extraction streamsFeature Extractor→Feature vector (N-dimensional)— Extracted features (word2vec + unigram counts + punctuation + capitalization + sentiment + intensifiers) concatenated into single vectorTraining feature vectors→Naive Bayes Classifier— Feature matrices from labeled documents used to fit Multinomial or Gaussian Naive Bayes model (learn class priors and feature distributions)Test feature vectors→Trained Naive Bayes Classifier— Feature vectors from test documents scored by classifier to produce satire/non-satire predictionsPredictions + ground truth labels→Evaluator (eval.prl)— Predictions compared against held-out test labels to compute accuracy, precision, recall, F1 metricsEvaluation metrics→Results output— Classification performance reported for model validation and hyperparameter tuning decisions
🛠️How to make changes
Add a new feature type to the classifier
- Document the new feature (e.g., linguistic patterns, syntactic features) in README.md under the feature types section (
README.md) - Implement feature extraction logic in the main classifier module, following the pattern of existing features (unigrams, punctuation, sentiment) (
README.md) - Add new feature extraction code that integrates with scikit-learn's pipeline, ensuring compatibility with MultinomialNB or GaussianNB (
eval.prl) - Update logging.conf to add debug logging for the new feature extraction step (
logging.conf) - Create corresponding test cases in test/ directory to validate the new feature performs correctly (
test-class)
Add new test data samples
- Create new test file following naming convention test/test-XXXX where XXXX is the next sequential number after test-0114 (
test/test-0114) - Format test data with satire/non-satire label and document content consistent with existing test samples (
test-class) - Run test-class to validate new samples are correctly classified by the model (
test-class)
Modify classifier behavior or hyperparameters
- Update model instantiation code (Multinomial/Gaussian Naive Bayes parameters) in the classifier pipeline (
eval.prl) - Execute eval.prl to re-evaluate model performance with new hyperparameters (
eval.prl) - Verify performance improvements/regressions by reviewing evaluation metrics output (
test-class)
🔧Why these technologies
- Naive Bayes (Multinomial & Gaussian) — Simple, interpretable, and performs very well for satire detection; computationally efficient for production use.
- scikit-learn — Provides battle-tested Naive Bayes implementations with integrated feature selection (Chi2) and evaluation metrics.
- NLTK — Essential for text preprocessing, tokenization, and access to linguistic resources (sentiment lexicons, intensifiers, interjections).
- gensim (word2vec) — Generates dense document embeddings as one of six feature types, capturing semantic similarity between documents.
- pandas & numpy — Efficient data manipulation and numerical operations for feature matrix construction and evaluation.
⚖️Trade-offs already made
-
Naive Bayes over deep learning (CNNs, RNNs, Transformers)
- Why: Naive Bayes demonstrated superior performance on this task; simpler interpretability, lower computational cost, less data required.
- Consequence: Model has lower ceiling for very large datasets; cannot capture complex non-linear patterns; feature engineering becomes critical.
-
Multiple heterogeneous feature types (embeddings + counts + linguistic)
- Why: Combines semantic signals (word2vec) with interpretable linguistic features (punctuation, sentiment, intensifiers) for robustness.
- Consequence: Increased feature dimensionality and extraction complexity; requires careful feature scaling and selection.
-
Multinomial Naive Bayes preferred over Gaussian
- Why: Count-based features (unigrams, punctuation) are discrete and non-negative, matching Multinomial assumptions perfectly.
- Consequence: Gaussian variant may underperform if used for non-embedding features; requires feature-type-specific model selection.
🚫Non-goals (don't propose these)
- Real-time or low-latency prediction (batch evaluation model)
- Multi-class satire detection (only binary satire vs non-satire)
- Explanation or interpretability of individual predictions (model outputs confidence only)
- Domain adaptation across different satire sources or writing styles
- Handling of non-English text or code-mixed content
⚠️Anti-patterns to avoid
- Heterogeneous feature types without explicit scaling (Medium) —
Feature extraction pipeline (implied in README feature list): Combining embeddings (continuous, dense), counts (discrete, sparse), and binary flags without normalization or feature scaling could cause Naive Bayes to weight embeddings poorly; Chi2 unigram selection helps but insufficient for cross-feature-type consistency. - Sparse documentation of model configuration (Low) —
README.md, eval.prl: README explains features but does not document Naive Bayes hyperparameters (smoothing, alpha), feature weighting strategy, or train/test split ratio; reproducing results or tuning requires reverse-engineering eval.prl. - No cross-validation mentioned —
eval.prl, test-class: Evaluation appears to use fixed test split (test/ directory) without k
🪤Traps & gotchas
- Python 2/3 ambiguity: Dependencies are frozen at 2017 versions; unclear if code is Py2 or Py3. Check for print statements vs. print() and run 2to3 if needed. 2. Word2vec model not included: gensim.word2vec is imported but no model file or training script visible; either pre-trained embeddings are expected or the model must be trained on an external corpus first (check README or comments in missing main file). 3. Test data format unclear: test/test-NNNN files have no extension; could be plaintext, JSON, or custom pickle format — need to inspect one to understand parsing. 4. eval.prl is Perl: Python code must write predictions in a format eval.prl expects; mismatch will break evaluation. 5. Chi2 threshold, sentiment lexicon sources: README mentions Chi2 filtering and sentiment polarity but no explicit thresholds or lexicon files in the listing; likely hardcoded in a missing feature module.
🏗️Architecture
💡Concepts to learn
- Multinomial Naive Bayes for text classification — This repo's primary discrete classifier for count-based features (unigrams, punctuation); understanding its conditional independence assumption and why it works well on sparse document-term matrices is central to the model design.
- Gaussian Naive Bayes for continuous embeddings — Used in this repo to model word2vec embeddings (continuous vectors) and to learn the meta-classifier on stacked probabilities; requires understanding of multivariate Gaussian assumption on embedding space.
- Stacked generalization (meta-learning) — The repo combines Multinomial NB and Gaussian NB outputs by treating their probabilities as a 4-dimensional feature space for a third Gaussian NB model; this stacking pattern improves generalization beyond either base learner alone.
- Chi-squared (χ²) feature selection — Used to filter unigrams by relevance to satire/non-satire classes; critical for reducing feature dimensionality and noise, and mentioned explicitly in the README as improving Multinomial NB performance.
- Word2vec (Skip-gram and CBOW embeddings) — This repo uses gensim.word2vec to generate dense document embeddings for Gaussian NB; understanding embedding quality and how it captures semantic similarity is essential for debugging satire-specific language patterns.
- 10-fold cross-validation — The repo reports 0.96 F-score on 10-fold CV (vs. 0.72 on test set), a standard technique to reduce variance in performance estimates; understanding why the gap exists and how to interpret it is crucial for assessing true model quality.
- Intensifiers and interjections as linguistic features — This repo extracts counts of intensifiers (e.g., 'very', 'extremely') and interjections ('wow', 'oh') as features; these signal emotional language common in satire, and understanding when and why they're informative is key to satire-specific feature design.
🔗Related repos
facebookresearch/fastText— fastText is a competing text classification library that also uses linear models and n-gram features, offering comparable simplicity and speed for satire detection with better multilingual support.textacy/textacy— Textacy provides higher-level NLP feature extraction (n-grams, POS patterns, readability) on top of spaCy/NLTK, reducing boilerplate for the 6 feature types (punctuation, capitalization, sentiment) hand-coded in this repo.clips/pattern— Pattern library offers built-in sentiment analysis, POS tagging, and text mining utilities; complements this repo's sentiment polarity and intensifier extraction, especially for older Python 2 codebases.nlptown/bert-base-multilingual-uncased— Represents modern alternative to Naive Bayes + word2vec ensemble: a pre-trained BERT model for text classification that could replace the entire feature engineering pipeline if fine-tuned on satire data.huggingface/transformers— Hugging Face Transformers library provides access to state-of-the-art classifiers (RoBERTa, DistilBERT) as drop-in replacements for the Gaussian + Multinomial NB stack, with minimal code changes.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add unit tests for feature extraction pipeline
The repo has 74 test data files (test/test-0001 through test/test-0074) but no visible test suite validating the feature extraction logic. Given the 6 feature types mentioned in README (word2vec embeddings, Chi2-filtered unigrams, punctuation, capitalization, sentiment polarity, intensifiers/interjections), there's no test coverage ensuring these features are correctly extracted. This is critical for a machine learning project where feature correctness directly impacts model performance.
- [ ] Create tests/test_feature_extraction.py to validate each of the 6 feature extraction methods
- [ ] Add tests for word2vec embedding generation with sample documents
- [ ] Add tests for Chi2 scoring and unigram filtering logic
- [ ] Add tests for punctuation/capitalization/sentiment/intensifier counting across test data files
- [ ] Integrate test suite into a CI workflow (GitHub Actions or similar)
Refactor eval.prl and create Python evaluation module
The eval.prl file appears to be a Perl script for evaluation, which is an unusual choice given the rest of the codebase is Python (scikit-learn, gensim, nltk). This creates a language mismatch and maintenance burden. Refactoring this into a proper Python module would improve consistency, enable better integration with the ML pipeline, and make contributions easier for Python developers.
- [ ] Create eval.py module with functions extracted from eval.prl logic
- [ ] Implement standard ML evaluation metrics (precision, recall, F1, confusion matrix) using sklearn.metrics
- [ ] Add support for cross-validation evaluation against the test/ dataset files
- [ ] Update README to document the new evaluation module and deprecate eval.prl
- [ ] Add example usage in README showing how to run evaluations
Create reproducible experiment configuration and logging
The repo has a logging.conf file but no clear experiment tracking, model versioning, or hyperparameter documentation. The requirements.txt pins old dependency versions (scikit-learn 0.19.0 from 2017, pandas 0.20.3, numpy 1.13.3) without documentation of model training parameters. Adding structured experiment configs and proper logging would enable reproducibility and make it easier for contributors to iterate on the classifier.
- [ ] Create experiments/ directory with example config files (YAML/JSON) for Gaussian and Multinomial Naive Bayes variants
- [ ] Add experiment.py module to load configs and log model hyperparameters, training metrics, and feature importance
- [ ] Document the exact training procedure in a TRAINING.md file (data splits, cross-validation strategy, feature selection process)
- [ ] Update logging.conf to capture model training details (feature counts, accuracy metrics, training time)
- [ ] Add sample_experiment.yaml showing configuration for reproducing the 1.0.0 baseline model
🌿Good first issues
- Add Python 3.8+ compatibility: Update pinned dependency versions in requirements.txt from 2017 editions (scikit-learn 0.19.0 → 0.24+, gensim 3.0.1 → 4.0+, nltk 3.2.5 → 3.7+) and test for breakage in feature extraction and model training. File to start: requirements.txt and any main training script (not in top 60 list).
- Document test data format and add parsing examples: Inspect test/test-0001 through test/test-0010 to determine the file format (plaintext, JSON, TSV, pickle), document it in README.md with examples, and add a Python script (e.g., test_loader.py) to parse all 54 test cases. This unblocks others from adding new test data.
- Extract and parameterize feature thresholds: README mentions Chi2-filtered unigrams and sentiment polarity but no visible configuration file for thresholds (Chi2 cutoff, sentiment lexicon source, intensifier word list). Create a config.json or config.py with all hardcoded feature extraction parameters, document them in README, and allow --config flag in the main script.
⭐Top contributors
Click to expand
Top contributors
- [@Sean Moran](https://github.com/Sean Moran) — 10 commits
- @sjmoran — 5 commits
📝Recent commits
Click to expand
Recent commits
b715f30— Minor update to the README.md (sjmoran)cc977b8— 1) Added tf-idf weighting in the word2vec computation (sjmoran)f4f1116— Fixed the rendering of asterix in markdown (sjmoran)336d68b— Updated the top words (sjmoran)ca22cd7— Trimmed the requirements.txt file (sjmoran)7abdbf6— Fixed the output directory creation (Sean Moran)03a580d— Added experiment name to output directory name (Sean Moran)dc5fb87— Fixed output directory creation (Sean Moran)dd8fbd8— Minor README.md update (Sean Moran)7dd34fb— Minor updates to the README.md (Sean Moran)
🔒Security observations
- High · Outdated and Vulnerable Dependencies —
requirements.txt. Multiple dependencies contain known security vulnerabilities due to being significantly outdated (from 2017). Notably: urllib3==1.22 (CVE-2021-28363, CVE-2020-26137), requests==2.18.4 (CVE-2018-18074), scikit-learn==0.19.0, and boto==2.48.0 (AWS SDK v2, deprecated since 2020). Fix: Update all dependencies to the latest stable versions. Run: pip install --upgrade -r requirements.txt with updated versions. For AWS operations, migrate from boto==2.48.0 to boto3>=1.26.0 - High · Duplicate/Invalid Package Definition —
requirements.txt - line with 'sklearn==0.0'. The requirements.txt contains 'sklearn==0.0' which is invalid and likely a mistake. The actual scikit-learn package is listed separately. This indicates incomplete dependency management and potential build failures. Fix: Remove the 'sklearn==0.0' entry from requirements.txt. Ensure only scikit-learn (not sklearn) is listed for the scikit-learn package. - High · Deprecated Python 2 Compatibility Package —
requirements.txt - pathlib==1.0.1. Presence of 'pathlib==1.0.1' suggests legacy Python 2 support. pathlib is built-in for Python 3.4+. This indicates the codebase may still support Python 2, which is unsupported since January 2020 and contains unpatched security vulnerabilities. Fix: Remove pathlib from requirements.txt if targeting Python 3.6+. Update codebase to Python 3.8+ and remove all Python 2 compatibility code. - Medium · Deprecated boto Package (AWS SDK v2) —
requirements.txt - boto==2.48.0. boto==2.48.0 is the deprecated AWS SDK v2, reaching end-of-life. It receives no security updates and is replaced by boto3. If used for AWS operations, this creates long-term security risk. Fix: Migrate to boto3>=1.26.0. Update all AWS API calls to use boto3 instead of boto. - Medium · Insecure Oauth Implementation —
requirements.txt - oauthlib, requests-oauthlib, twython. The presence of 'oauthlib==2.0.4' and 'requests-oauthlib==0.8.0' (from 2017) without visible secure credential management suggests potential OAuth token exposure. The 'twython==3.6.0' (Twitter API) further indicates API credential usage. Fix: Ensure all OAuth tokens and API credentials are stored in environment variables or secure vaults (AWS Secrets Manager, HashiCorp Vault), never in code or version control. Use 'python-dotenv' with a .gitignored .env file for local development. - Medium · No HTTPS Certificate Verification Configuration Visible —
requirements.txt - certifi==2017.7.27.1. certifi==2017.7.27.1 is outdated and may not contain current root certificates. Combined with requests library, this could lead to SSL/TLS vulnerabilities or MITM attacks if not properly configured. Fix: Update certifi to latest version (>=2023.7.22). Ensure requests always verify SSL certificates: requests.get(url, verify=True). Never set verify=False in production. - Low · Missing Security Headers Configuration —
Repository root - missing configuration. No visible security configuration file (security.txt, headers configuration, CORS policy) suggests potential missing security headers if this is used as a web service. Fix: If exposing this as a web service, implement security headers (Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, Strict-Transport-Security). - Low · No Secrets Management —
Repository structure. No evidence of secrets management framework (.env, .env.example, python-dotenv, or similar) for handling API credentials, tokens, or sensitive configuration. Fix: Implement python-dotenv for
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/sjmoran/satire-classifier shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live sjmoran/satire-classifier
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/sjmoran/satire-classifier.
What it runs against: a local clone of sjmoran/satire-classifier — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in sjmoran/satire-classifier | Confirms the artifact applies here, not a fork |
| 2 | Default branch master exists | Catches branch renames |
| 3 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 4 | Last commit ≤ 1292 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of sjmoran/satire-classifier. If you don't
# have one yet, run these first:
#
# git clone https://github.com/sjmoran/satire-classifier.git
# cd satire-classifier
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of sjmoran/satire-classifier and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "sjmoran/satire-classifier(\\.git)?\\b" \\
&& ok "origin remote is sjmoran/satire-classifier" \\
|| miss "origin remote is not sjmoran/satire-classifier (artifact may be from a fork)"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "README.md" \\
&& ok "README.md" \\
|| miss "missing critical file: README.md"
test -f "requirements.txt" \\
&& ok "requirements.txt" \\
|| miss "missing critical file: requirements.txt"
test -f "logging.conf" \\
&& ok "logging.conf" \\
|| miss "missing critical file: logging.conf"
test -f "eval.prl" \\
&& ok "eval.prl" \\
|| miss "missing critical file: eval.prl"
test -f "test-class" \\
&& ok "test-class" \\
|| miss "missing critical file: test-class"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 1292 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~1262d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/sjmoran/satire-classifier"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.
Similar Python repos
Other concerns-signal Python repos by stars.
Embed this chat in your README →
Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.
<iframe src="https://repopilot.app/embed/sjmoran/satire-classifier" width="100%" height="500" style="border:1px solid #d0d7de; border-radius:8px;" allow="microphone" loading="lazy" ></iframe>