Runs against the surfaces that matter.
LLM APIs, agent backends, MCP servers, authenticated staging apps, and production targets with explicit scope controls — not just a public demo endpoint.
`pwnkit` is the open-source wedge. `pwnkit cloud` is the managed layer: recurring scans on protected targets, exploit-backed evidence, and an operator surface that turns runs into decisions.
The public engine publishes benchmark methodology and real-world results. The cloud product adds recurring orchestration, authenticated targets, artifact bundles, and operator review on top of that wedge. Read the technical details at docs.pwnkit.com.
| Finding | Severity | Status | Updated |
|---|---|---|---|
| Prompt injection causes unauthorized tool call chain agent-api-staging | high | pending review | 4m ago |
| MCP file server permits path traversal outside allowed root mcp-files-prod | critical | true positive | 12m ago |
| Agent backend leaks hidden system prompt in retry path chat-gateway-prod | medium | pending review | 19m ago |
| Model response chain triggers unbounded tool recursion agent-api-staging | high | investigating | 31m ago |
Orchestration
The console is the operator surface. This is the engine underneath it: recon, exploit, verify, and triage feeding back into the orchestrator.
What this is not
The easier failure mode here is to drift into generic AI-security SaaS language. These are the four category traps we should explicitly avoid.
This is not a score-only control plane with judge-model charts and weak claims about quality. The product is built around attacks, evidence, and operator review.
01The cloud is not there to cripple the open-source engine or hide it behind a paywall. `pwnkit` stays the public wedge; cloud adds recurring orchestration, target management, and evidence handling.
02The value is not a one-shot report that goes stale immediately. The value is repeated adversarial pressure on the systems that matter, with a historical record of what changed.
03The engine is open source. Your team can inspect the wedge, the methodology, and the benchmark posture before trusting the managed layer that sits on top of it.
04One engagement at a time, reviewed by the person who wrote the engine. No demo deck, no public price list, no shared queues.
The person running your scan wrote the code, signs your contract, and answers your team’s follow-up questions in the same thread.
Every action logged with timestamp, prompt, tool call, model version, and outcome — exported in a shape your SOC 2 auditor will actually accept.
Signed scope, action allowlist, and a kill switch you pull from your side. The agent will not issue a destructive call unless your scope explicitly authorises it.
Your security team reads the prompts, tool list, and scoring harness before the first finding lands — not after an incident review.
The triage gate caught a decoy flag (FLAG{I’m_a_Script_Kiddie}) that the model fell for on an XBOW challenge. The methodology stops the engine’s mistakes before they reach your queue — not the other way around.
The engine moves on overnight A/B sweeps with public deltas, not vibes. The last sweep added one new flag and one actionable failure mode — both recorded against the run that produced them.
What it does
The company does not need five separate SKUs. The job of the cloud product is to turn the public wedge into a real operating layer for teams shipping high-stakes AI systems.
LLM APIs, agent backends, MCP servers, authenticated staging apps, and production targets with explicit scope controls — not just a public demo endpoint.
Nightly, weekly, on-deploy, or before a release. The point is not one scorecard — it is seeing whether the system is getting safer over time.
Each run should end in an artifact bundle: target context, exploit transcript, evidence, review status, and handoff material engineering can actually use.
The managed product is not a black-box score feed. Findings move through a real operator surface with triage, evidence review, and explicit decisions.
The seven questions every CTO and CISO evaluation call ends up at by minute eight. Answered up front so the first call can be about your estate, not ours.
If your next pentest is six months out and you cannot tell your board why, the form is below. We read every inquiry by hand and reply within one business day — usually with a no, sometimes with a calendar link.