
Research Engineer / Scientist, Robustness
- San Francisco, CA
- Permanent
- Full-time
- Testing the robustness of our safety techniques by training language models to subvert our safety techniques, and seeing how effective they are at subverting our interventions.
- Run multi-agent reinforcement learning experiments to test out techniques like AI Debate.
- Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
- Write scripts and prompts to efficiently produce evaluation questions to test models' reasoning abilities in safety-relevant contexts.
- Contribute ideas, figures, and writing to research papers, blog posts, and talks.
- Run experiments that feed into key AI safety efforts at Anthropic, like the design and implementation of our Responsible Scaling Policy.
- Have significant software, ML, or research engineering experience
- Have some experience contributing to empirical AI research projects
- Have some familiarity with technical AI safety research
- Prefer fast-moving collaborative projects to extensive solo efforts
- Pick up slack, even if it goes outside your job description
- Care about the impacts of AI
- Have experience authoring research papers in machine learning, NLP, or AI safety
- Have experience with LLMs
- Have experience with reinforcement learning
- Have experience with Kubernetes clusters and complex shared codebases