RepoPilot

minimaxir/big-list-of-naughty-strings

The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.

Mixed

Stale — last commit 2y ago

MixedDependency

last commit was 2y ago; no tests detected…

MixedFork & modify

no tests detected; no CI workflows detected…

HealthyLearn from

Documented and popular — useful reference codebase to read through.

MixedDeploy as-is

last commit was 2y ago; no CI workflows detected

  • Stale — last commit 2y ago
  • No CI workflows detected
  • No test directory detected
  • 35+ active contributors
  • Distributed ownership (top contributor 46% of recent commits)
  • MIT licensed

What would improve this?

  • Use as dependency MixedHealthy if: 1 commit in the last 365 days; add a test suite
  • Fork & modify MixedHealthy if: add a test suite
  • Deploy as-is MixedHealthy if: 1 commit in the last 180 days

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Great to learn from" badge

Paste into your README — live-updates from the latest cached analysis.

RepoPilot: Great to learn from
[![RepoPilot: Great to learn from](https://repopilot.app/api/badge/minimaxir/big-list-of-naughty-strings?axis=learn)](https://repopilot.app/r/minimaxir/big-list-of-naughty-strings)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card

This card auto-renders when someone shares https://repopilot.app/r/minimaxir/big-list-of-naughty-strings on X, Slack, or LinkedIn.

Ask AI about minimaxir/big-list-of-naughty-strings

Grounded in the actual source code. Pick a starter question or write your own.

Or write your own question →

Onboarding doc

Onboarding: minimaxir/big-list-of-naughty-strings

Generated by RepoPilot · 2026-06-19 · Source

🎯Verdict

WAIT — Stale — last commit 2y ago

  • 35+ active contributors
  • Distributed ownership (top contributor 46% of recent commits)
  • MIT licensed
  • ⚠ Stale — last commit 2y ago
  • ⚠ No CI workflows detected
  • ⚠ No test directory detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

TL;DR

The Big List of Naughty Strings (BLNS) is a curated collection of strings designed to break, crash, or expose vulnerabilities in software when used as user input. It provides test data in multiple formats (blns.txt, blns.json, blns.base64.json) organized by failure category (unicode edge cases, SQL injection vectors, format string attacks, etc.) and is distributed as a Go package, Python module, and npm package for integration into QA test suites. Dual-format monorepo: source is blns.txt (newline-delimited with comment sections), which is programmatically converted to blns.json and blns.base64.json via scripts/txt_to_json.py. Language-specific packages live in the naughtystrings/ directory: Go code (naughtystrings.go, naughtystrings_test.go) reads embedded data, Python code (init.py) provides similar access. Makefiles coordinate generation and packaging.

👥Who it's for

QA engineers and software developers performing input validation testing on their own applications. Security testers performing manual fuzzing. Library maintainers across Go, Python, Node.js, and Shell ecosystems who want to distribute naughty strings to their users via package managers.

🌱Maturity & risk

This is a stable, actively maintained reference dataset. The repository has multiple language implementations (Python 1808 LOC, Go 1378 LOC), test coverage (naughtystrings_test.go exists), proper Go module setup (go.mod), and is published across multiple package managers (npm, PyPI). The project is production-ready as a test data source, not experimental code.

Low risk for users—this is a data distribution repo, not a runtime dependency with complex logic. Single-maintainer risk (minimaxir) is the primary concern. No external dependencies in the Go module suggests good isolation. The main operational risk is that naughty strings may trigger legitimate security tools (antivirus, WAF) when tested against third-party services, which is why the disclaimer in README explicitly warns against using BLNS on software you don't own.

Active areas of work

No recent activity data provided in the repo snapshot, so unable to determine active PRs or milestones. However, the presence of multiple output formats (txt, json, base64) and maintained language bindings suggests ongoing curation of the naughty strings list and maintenance of cross-language compatibility.

🚀Get running

Check README for instructions.

Daily commands: This is a data distribution repo, not an executable service. To generate derived formats: make in the naughtystrings/ directory (see Makefile). To test Go bindings: go test ./naughtystrings or cd naughtystrings && go test. To consume in code: import naughtystrings package (Go) or load blns.json (any language).

🗺️Map of the codebase

  • blns.txt: Source-of-truth file: hand-curated list of naughty strings organized into categories with # comment delimiters; all other formats are generated from this.
  • scripts/txt_to_json.py: Build pipeline: Python script that parses blns.txt (stripping comments) and generates blns.json and blns.base64.json; must run after any blns.txt edits.
  • naughtystrings/naughtystrings.go: Go package implementation: exposes naughty strings as Go variables and provides programmatic access via exported functions; primary entry point for Go users.
  • naughtystrings/init.py: Python package implementation: loads and exposes naughty strings to Python users; primary entry point for Python language binding.
  • naughtystrings/naughtystrings_test.go: Go test suite: validates that naughtystrings.go correctly loads and exposes strings; ensures package integrity across releases.
  • naughtystrings/Makefile: Build orchestration: coordinates regeneration of blns.json and blns.base64.json from blns.txt; must run after edits before committing.
  • blns.json: Programmatic access format: JSON array of all naughty strings (comments stripped); consumed by Python init.py and used directly by JavaScript/other language consumers.

🛠️How to make changes

Adding a new naughty string: Edit blns.txt directly—add your string to the appropriate comment-delineated section (or create a new ## Section header). Keep strings under 255 characters per README guidelines. After editing blns.txt: Run make to regenerate blns.json and blns.base64.json. Updating bindings: For Go, edit naughtystrings.go to expose new strings if needed. For Python, verify init.py correctly loads updated blns.json. Testing: Run go test ./naughtystrings to validate. Before PR: Ensure all output files (blns.txt, blns.json, blns.base64.json, blns.base64.txt) are regenerated and committed together.

🪤Traps & gotchas

No null characters (U+0000): GitHub renders files with null bytes as binary, breaking PR readability. No EICAR test string: Antivirus may flag the file. String length limit (255 chars): Very long strings make blns.txt unwieldy in editors and reduce manual usability; enforce during review. Regeneration requirement: blns.json, blns.base64.json, and blns.base64.txt must be regenerated via make after editing blns.txt—committing only blns.txt changes without regenerating derived files breaks the build and consumer expectations. Encoding preservation: Do not change blns.txt encoding; this affects downstream consumers and breaks reproducibility. Third-party software warning: Many naughty strings are known exploit vectors (SQL injection, XSS payloads); using BLNS against third-party software you don't own may violate computer fraud laws—this is explicitly disclaimed in README but worth noting during contribution review.

💡Concepts to learn

  • Unicode edge cases and normalization attacks — BLNS heavily features zero-width spaces, right-to-left marks, combining characters, and normalization forms because these break string comparison, length validation, and display logic; understanding Unicode normalization (NFC, NFD) is critical to avoiding BLNS failures.
  • SQL Injection and parameterized queries — BLNS includes SQL metacharacters (' " ; --, etc.) and escape sequences to detect unsafe string concatenation in database queries; this is the canonical weak input validation problem.
  • Format string vulnerabilities — BLNS contains strings like %x, %n, and printf-style patterns that expose memory when logged or printed without format string protection; relevant for C/C++ logging and certain templating contexts.
  • Cross-site scripting (XSS) payload vectors — BLNS includes HTML tags, JavaScript event handlers, and SVG injection patterns; testing with these strings reveals inadequate input sanitization in web applications.
  • Regular expression denial of service (ReDoS) — BLNS includes patterns with nested quantifiers and backtracking-heavy constructs (e.g., repeating groups with alternation) that cause catastrophic backtracking; critical for validating regex-based input filters.
  • Code injection via template and expression languages — BLNS contains template syntax for Handlebars, Mustache, Jinja2, and EL (Expression Language) because many web frameworks allow user input to be evaluated as code; this tests whether your application treats input as data, not code.
  • Base64 encoding as obfuscation and attack vector — BLNS is distributed in blns.base64.json because Base64 encoding hides malicious strings from naive text scanning (antivirus, WAF signatures) but is trivially decoded at runtime; tests whether your application decodes user input and then validates it.
  • owasp/owasp-mstg — OWASP Mobile Security Testing Guide provides structured vulnerability categories and test cases that overlap with BLNS payloads; reference for security-focused string curation.
  • swisskyrepo/PayloadsAllTheThings — Comprehensive payload repository organized by attack vector (SQL injection, XSS, SSRF, etc.); BLNS is a curated subset focused on edge cases and surprises rather than explicit exploits, but PayloadsAllTheThings is the 'complete' version.
  • danielmiessler/SecLists — Modular collection of security testing wordlists and payloads (usernames, passwords, web paths, fuzzing strings); BLNS complements SecLists as a focused, curated alternative for input validation testing.
  • fuzzdb-project/fuzzdb — Database of attack patterns, common values, and fuzzing strings organized by vulnerability type; similar problem space to BLNS but more comprehensive and less curated for human readability.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add CI workflow to validate blns.json/blns.txt consistency across formats

The repo maintains multiple format variants (blns.json, blns.txt, blns.base64.json, blns.base64.txt) generated via scripts/txt_to_json.py and scripts/texttobase64.sh. There's no automated validation that these formats stay in sync or that the conversion scripts produce valid output. A GitHub Actions workflow could catch format drift and invalid entries before merging.

  • [ ] Create .github/workflows/validate-formats.yml to run on PRs
  • [ ] Add validation logic in scripts/ to verify blns.txt → blns.json conversion produces valid JSON
  • [ ] Verify base64 variants decode correctly and match original strings
  • [ ] Add check that all three text/json variants contain the same number of strings

Add comprehensive cross-language integration tests in naughtystrings/naughtystrings_test.go

The Go package has naughtystrings_test.go but it only tests the Go implementation. Since this repo is language-agnostic and users consume blns.json/blns.txt across Python/Node/Go, there should be tests verifying the Go package correctly loads and exposes all strings from the JSON source, matching the canonical list exactly.

  • [ ] Expand naughtystrings/naughtystrings_test.go to load blns.json and verify Go API returns identical strings
  • [ ] Add test for GetNaughtyStrings() count matches JSON entry count
  • [ ] Add test for Unicode edge cases (zero-width space mentioned in README) are present and uncorrupted
  • [ ] Verify resource embedding in naughtystrings/internal/resource.go stays in sync with blns.json

Document the string categories/taxonomy in README with examples from blns.json structure

The README explains why to test naughty strings but doesn't document what categories of problematic inputs are actually included in blns.json. Users can't understand coverage without knowing if SQL injection, XXS, Unicode issues, etc. are represented. The JSON likely has categories but they're undocumented.

  • [ ] Inspect blns.json structure to identify how strings are categorized (top-level keys, metadata fields, etc.)
  • [ ] Add 'String Categories' section to README listing each category with 1-2 concrete examples
  • [ ] Link category descriptions to real-world issues (e.g., 'Unicode Normalization Issues' → the zero-width space Twitter bug mentioned)
  • [ ] Document how contributors should categorize new naughty strings when submitting PRs

🌿Good first issues

  • Add test coverage for Python init.py module loading: Currently naughtystrings_test.go validates Go bindings, but there's no visible test file for Python package. Add a pytest-based test in tests/test_naughtystrings.py that verifies naughtystrings/init.py correctly loads all strings from blns.json and exposes them via expected module-level attributes.
  • Document the txt_to_json.py script with inline code comments and usage examples: scripts/txt_to_json.py lacks inline documentation explaining how it parses the comment-delimited sections from blns.txt and generates JSON. Add docstrings and a usage example showing how contributors should invoke the script manually if the Makefile fails.
  • Create a contribution checklist and validate script for pull requests: Add scripts/validate_blns.sh that a contributor can run locally to check: (1) no null bytes in blns.txt, (2) no strings exceed 255 characters, (3) all output files (json, base64) are regenerated, (4) no EICAR string present. Reference this script in CONTRIBUTING.md to prevent invalid PRs.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • db33ec7 — Merge pull request #226 from caasi/patch-1 (minimaxir)
  • 18a8898 — Index XSS strings (caasi)
  • 894882e — Merge pull request #211 from doroshenko/master (minimaxir)
  • f356d4d — Merge pull request #210 from tryauuum/master (minimaxir)
  • b2eada7 — Added emoji zwj sequences (Dmytro Doroshenko)
  • 0d5fd11 — added jinja2 injections (tryauuum)
  • 9c25300 — Merge pull request #209 from xeroskiller/patch-1 (minimaxir)
  • 494b425 — Update blns.json (xeroskiller)
  • a9bae33 — Added tSQL-specific injection string (xeroskiller)
  • ff8b1b2 — Merge pull request #206 from mattsparks/master (minimaxir)

🔒Security observations

The 'Big List of Naughty Strings' repository is primarily a data and testing utility project with low inherent security risks. No critical or high-severity vulnerabilities were identified. The codebase lacks typical security concerns such as hardcoded credentials, external dependencies with known vulnerabilities, or injection vulnerabilities. Minor recommendations include adding strict mode to shell scripts and input validation to Python utilities. The project's purpose is security-focused (QA testing), and the overall security posture is solid for this use case.

  • Low · Shell Script Without Strict Mode — scripts/texttobase64.sh. The script 'scripts/texttobase64.sh' is a shell script that may not have strict error handling enabled. Shell scripts without 'set -e' or 'set -euo pipefail' can continue executing even if intermediate commands fail, potentially leading to unexpected behavior. Fix: Add '#!/bin/bash' with 'set -euo pipefail' at the beginning of the script to ensure proper error handling and prevent execution of subsequent commands if any command fails.
  • Low · Python Script Without Input Validation — scripts/txt_to_json.py. The script 'scripts/txt_to_json.py' converts text to JSON but the file review suggests potential lack of input validation. If this script processes untrusted input, it could be vulnerable to injection attacks or malformed data handling. Fix: Implement proper input validation, use safe JSON encoding methods, and validate file contents before processing. Use 'json.dumps()' with proper escaping.
  • Low · Potential Information Disclosure via Test Files — naughtystrings/naughtystrings_test.go. The file 'naughtystrings/naughtystrings_test.go' may contain test cases that expose expected behavior or edge cases. If sensitive patterns are documented in tests, this could aid attackers in understanding bypass techniques. Fix: Ensure test files do not contain sensitive information or actual exploit payloads. Keep test cases generic and avoid documenting specific bypass techniques.

LLM-derived; treat as a starting point, not a security audit.

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/minimaxir/big-list-of-naughty-strings shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live minimaxir/big-list-of-naughty-strings repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/minimaxir/big-list-of-naughty-strings.

What it runs against: a local clone of minimaxir/big-list-of-naughty-strings — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in minimaxir/big-list-of-naughty-strings | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | Last commit ≤ 781 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>minimaxir/big-list-of-naughty-strings</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of minimaxir/big-list-of-naughty-strings. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/minimaxir/big-list-of-naughty-strings.git
#   cd big-list-of-naughty-strings
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of minimaxir/big-list-of-naughty-strings and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "minimaxir/big-list-of-naughty-strings(\\.git)?\\b" \\
  && ok "origin remote is minimaxir/big-list-of-naughty-strings" \\
  || miss "origin remote is not minimaxir/big-list-of-naughty-strings (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 781 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~751d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/minimaxir/big-list-of-naughty-strings"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Embed this chat in your README →

Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.

<iframe
  src="https://repopilot.app/embed/minimaxir/big-list-of-naughty-strings"
  width="100%" height="500"
  style="border:1px solid #d0d7de; border-radius:8px;"
  allow="microphone"
  loading="lazy"
></iframe>