Principal Site Reliability Engineer (Remote)
Teaching Strategies
- Bethesda, MD
- Permanent
- Full-time
- Passion for reliability and performance, you will own uptime and support all customer-facing services and products
- Own and drive improvements to observability of service performance metrics, monitors, and alerting
- Provision, manage and automate our SaaS platform across multiple production and test environments
- Support and enhance build and release pipelines using process and tooling to provide self-service automations
- Collaborate with development teams on software and platform, helping to identify and remove potential performance bottlenecks
- Help our engineering partners establish SLIs and SLOs for their services
- Participate in the on-call rotation with the team
- Resolve incidents, perform root cause analysis, and grow our library of runbooks
- Implement and automate security controls, governance processes, and compliance validation
- Actively participate in and drive infrastructure architecture decisions
- Mentor junior members of the team
- Occasional domestic travel required for in-person team, department, and company meetings
- Minimum of 10 years of build automation and release management experience in a SaaS production environment
- Hands-on experience with Linux and system administration and engineering
- Comfortable in a containerized world of Kubernetes (EKS), helm, and ArgoCD
- Proficiency with configuration management tools such as Ansible, Chef, Salt
- Production experience in operations for an always-up, always-available mission-critical service
- Strong knowledge of ephemeral infrastructure, horizontal scaling, self-healing architectures, service discovery, logging, monitoring and alerting
- Expert level experience with AWS and hybrid cloud systems/designs
- Proficiency with IaC tools such as Terraform and AWS CloudFormation
- Expert understanding and ability to troubleshoot systems at the protocol layer - TCP/IP, UDP, HTTP, SSL/TLS, and DNS
- Proficient with multiple scripting languages such as Bash, Python, or Go
- Experience developing CI/CD pipelines using Jenkins or BitBucket Pipelines
- Knowledge of best-practice security, performance, and networking techniques for high-traffic customer-facing systems
- Experience with monitoring and logging tools such as New Relic or AWS CloudWatch
- Experience with relational and NoSQL databases, including Microsoft SQL, Postgres, and MongoDB
- Excellent troubleshooting and testing skills
- A passion for learning new technologies
- Experience with Agile methodology and passion for software development best practices
- Strong sense of collaboration, teamwork, and accountability
- Bonus: Experience working for a B2B SaaS company
- Competitive compensation package, including Employee Equity Appreciation Program
- Health insurance benefits
- 401k with employer match
- 100% remote work environment
- Unlimited paid time off (which includes paid holidays and Winter Break)
- Paid parental leave
- Tuition assistance and Professional development and growth opportunities
- 100% paid life, short and long term disability insurance
- Pre-tax medical and dependent care flexible spending accounts (FSA)
- Voluntary life and critical illness insurance