
Staff Software Engineer, Reliability
- Menlo Park, CA
- Permanent
- Full-time
- Applications including brokerage, crypto and money
- Service Level Agreements (SLAs) and Service Level Objectives (SLOs)
- Incident metrics (Mean Time To Detect and Mean Time To Resolve)
- Production Readiness Review (PRR)
- Monitoring
- Canary
- Shift left on testing including pre-production, integration and load testing
- Costs and efficiency
- Built and owned the pre-production and staging environments for internal software engineers.
- Experience running on Elastic Kubernetes Service (EKS) on AWS or another cloud provider
- Experience working with Observability systems with a goal of reducing incident metrics such as Mean-Time-To-Detect (MTTD) and Mean-Time-To-Resolve (MTTR)
- Experience working with large Infrastructure components such as compute, storage networking and/ or developer infrastructure
- Design, build, and maintain large-scale systems that power Robinhood's platform, infrastructure, and core services
- Write and review high-quality code, create capacity and scaling plans, and debug complex, real-time issues in mission-critical systems used by millions of customers.
- Lead by example, mentoring teammates, promoting best practices, and fostering a culture focused on operational excellence and system resilience.
- Take ownership of system reliability by participating in on-call rotations, proactively addressing potential issues, and driving long-term improvements to reduce downtime.
- Collaborate with industry-leading engineers to develop scalable tools and infrastructure that meet Robinhood's growing demands.
- Drive innovation by optimizing infrastructure for reliability and cost-efficiency, supporting Robinhood's mission to democratize finance for all at a global scale.
- 8+ years experience in designing, building, and maintaining large-scale, distributed systems
- Proficiency in programming languages such as Python/Go/C++
- Expertise in operating systems (Linux/Unix), networking, and troubleshooting sophisticated production issues in high-availability environments.
- A track record of mentoring team members, fostering collaboration, and contributing to a culture of continuous improvement.
- Challenging, high-impact work to grow your career
- Performance driven compensation with multipliers for outsized impact, bonus programs, equity ownership, and 401(k) matching
- Best in class benefits to fuel your work, including 100% paid health insurance for employees with 90% coverage for dependents
- Lifestyle wallet - a highly flexible benefits spending account for wellness, learning, and more
- Employer-paid life & disability insurance, fertility benefits, and mental health benefits
- Time off to recharge including company holidays, paid time off, sick time, parental leave, and more!
- Exceptional office experience with catered meals, events, and comfortable workspaces.