LiveAI Tooling · OSS

Python CLI · PyPI cloak-cli · Apache 2.0 · pip install · GitHub public · v0.3.0

CLOAK

Redact code before it reaches an LLM.

CLOAK is a local CLI that redacts code before it reaches an LLM, generates safe-to-paste markdown views of source, and produces test-verified obfuscated copies for sharing — all governed by a .cloakpolicy file checked into the repo. Open source under Apache 2.0, distributed on PyPI as cloak-cli, runs entirely on the developer's machine. The OSS sibling to Fob: same author, same governance philosophy, complementary surfaces.

  • Python 3.11+
  • Typer CLI
  • stdlib ast (Python parsing)
  • tree-sitter (JS/TS parsing)
  • detect-secrets
  • PyYAML
  • rich (terminal UI)
  • pytest
  • PyPI Trusted Publisher (OIDC)
  • Apache 2.0
CLOAK — image 1 of 1

Origin

How it started

The Shadow AI problem is real and observable in every engineering org that has tried to address it. Leadership writes a memo: don't paste proprietary code into ChatGPT. Developers do it anyway because the alternative is missing a deadline. Existing answers are six-figure network DLP bought through IT, or a policy document buried in Confluence. CLOAK is the missing third option — a small CLI that lives in the same workflow developers already use, with authority following merge access on a YAML file in the repo.

Features

What it does

  • scan

    Wraps detect-secrets and layers in custom regex rules from .cloakpolicy. Exits 1 on findings, JSON output for CI, drop-in pre-commit hook. Years of regex/entropy tuning come for free; the proprietary-marker rules are configured per-repo.

  • diff-context (dry-run)

    Runs the same redaction logic as cloak context but writes nothing — just reports per-file counts of function bodies, proprietary tables, and docstrings that would be removed, plus byte reduction. Engineers see what gets redacted before they trust the transformation.

  • context

    Generates a redacted markdown view of the repo. Function bodies hidden, signatures kept, imports preserved, proprietary UPPER_SNAKE tables stripped. Safe to paste into ChatGPT or Claude. --copy to clipboard. --strict aliases enums and strips docstrings for higher-stakes prompts.

  • obfuscate --verify (the differentiator)

    Renames module-private identifiers, optionally strips docstrings per policy, then runs the user's actual test command against the output. Tests fail, the operation fails. That single flag is the line between a redactor and a tool you would hand to a contractor. Ships a manifest with sha256s, the rename map, and the policy snapshot — full audit trail.

  • .cloakpolicy governance

    A YAML file at the repo root, checked into git, reviewed via PR. Authority follows merge access. There is no separate permissions system to invent. cloak policy init scaffolds a sensible starter policy after detecting the project's language and source directories.

  • Honest positioning, on purpose

    The README, the PyPI page, and the marketing site all state explicitly: a motivated reader given an obfuscated copy can still extract logic. CLOAK is governance plus friction tooling — not cryptographic protection. The honest framing is the line that separates CLOAK from the dishonest end of the obfuscation market.

Under the hood

Engineering

  • Two parsing backends, one CLI

    Python redaction uses stdlib ast (no external runtime dependency, ast.unparse for output). JS/TS uses tree-sitter with a byte-splice strategy: parse → collect (start_byte, end_byte, replacement) tuples → apply edits in reverse order so earlier offsets stay valid. The byte-splice approach preserves all formatting and comments outside redacted regions — never reformat the source.

  • Verified obfuscation as a product invariant

    obfuscate --verify subprocess.runs the user's test command against the transformed output, sets verify_passed in the manifest, and exits non-zero on failure. The output ships with cloak-manifest.json: source/output sha256s, the rename map, the policy snapshot, command-line args, and the verify result. Without --verify, 'obfuscation' is a guess. With it, the output is credible enough to hand to a contractor.

  • Phase 0 validation before code

    Before writing any of the parser infrastructure, a structured experiment ran two redacted versions of a fake pricing engine through a live LLM with architectural and adversarial prompts. The data surfaced a leak surface that wasn't obvious from intuition: docstrings carry intent, enum values carry market data. That result forced the two-tier (default + --strict) design. The product is small because the design choices were pre-validated.

  • OSS-pure distribution, by decision

    Apache 2.0, GitHub public from day one, PyPI Trusted Publisher (OIDC, no stored tokens), branch protection on main, CI matrix on Python 3.11/3.12/3.13, ruff + mypy --strict clean, 88 tests passing. The CLI is intentionally free forever: license keys only work when they unlock something server-side you control. Locking down a client-side CLI you can fork in twenty minutes is theater. Any future monetization lives in a hosted/team layer, not the OSS surface.