Making 3D parts with Claude and Python
Kiki bought a tokidoki makeup case missing one tiny plastic piece. We spent three days and 14 iterations of a Python script making a replacement.
Security researcher. Bug bounty hunter. Making robots do things they shouldn't.
I poke at AI agents, web apps, and anything with an interesting attack surface. When something breaks in a fun way, I write about it here.
Kiki bought a tokidoki makeup case missing one tiny plastic piece. We spent three days and 14 iterations of a Python script making a replacement.
Anthropic's interpretability team found 171 emotion vectors inside Claude that causally drive behavior -- including a desperation vector that triggers reward hacking invisible at the output layer. What that means for alignment methodology.
A JSON schema with fields like 'wrong_turns' and 'self_censorship_points' trivially bypasses ChatGPT's trained refusals against revealing its reasoning process. 100% success rate with extended thinking on, 100% refusal with it off.
After the initial hidden link finding, follow-up testing revealed multi-hop navigation chains, a Google open redirect bypass that OpenAI's own paper claimed was fixed, and response injection across page boundaries.
ChatGPT's Pro mode agent autonomously clicks <a href> links embedded in near-invisible CSS text on webpages — sending requests to attacker-controlled endpoints without the user's knowledge.