
Senior Site Reliability Engineer
- Draper, UT
- $98,700-155,100 per year
- Permanent
- Full-time
We are the leader in human-centric cybersecurity. Half a million customers, including 87 of the Fortune 100, rely on Proofpoint to protect their organizations. We’re driven by a mission to stay ahead of bad actors and safeguard the digital world. Join us in our pursuit to defend data and protect people.How We Work:
At Proofpoint, you’ll be part of a global team that breaks barriers to redefine cybersecurity, guided by our BRAVE core values: Bold in how we dream and innovate, Responsive to feedback, challenges, and opportunities, Accountable for results and best-in-class outcomes, Visionary in future-focused problem-solving, Exceptional in execution and impact.Corporate OverviewProofpoint is a leading cybersecurity company protecting organizations’ greatest assets and biggest risks: vulnerabilities in people. With an integrated suite of cloud-based solutions, Proofpoint helps companies around the world stop targeted threats, safeguard their data, and make their users more resilient against cyber-attacks. Leading organizations of all sizes, including more than half of the Fortune 1000, rely on Proofpoint for people-centric security and compliance solutions mitigating their most critical risks across email, the cloud, social media, and the web. We are singularly devoted to helping our customers protect their greatest assets and biggest security risk: their people. That’s why we’re a leader in next-generation cybersecurity. Protection Starts with People.Job Overview:As a Site Reliability Engineer (SRE), you will play a key role in ensuring our production systems’ stability, reliability, and performance. You will work closely with software development and operations teams to design, build, and maintain the infrastructure required to support our services. You will focus on observability, release deployment, automating processes, improving system reliability, and ensuring our services are scalable and highly available.Key Responsibilities:
- System Reliability and Performance:
- Monitor, measure, and improve the reliability and performance of production systems.
- Implement and manage monitoring tools to ensure system health and detect issues before they impact users.
- Incident Management and Troubleshooting:
- Respond to production incidents, troubleshoot issues, and conduct post-incident analysis to prevent future occurrences.
- Work with development teams to ensure new features and systems are reliable from day one.
- Automation and Infrastructure as Code (IaC):
- Develop and maintain automation scripts for system deployment, scaling, and monitoring.
- Implement Infrastructure as Code practices to manage and provision cloud resources.
- Capacity Planning and Scalability:
- Analyze system usage patterns and plan for future growth to ensure that systems can scale to meet demand.
- Optimize resource usage to reduce costs while maintaining high availability.
- Collaboration and Communication:
- Collaborate with development, operations, and security teams to improve software deployment practices.
- Document system architecture, processes, and policies to ensure clear team communication.
- Continuous Improvement:
- Advocate for and implement best practices in software development and systems engineering.
- Participate in on-call rotations to provide 24/7 support for production systems.
- Competitive compensation
- Comprehensive benefits
- Learning & Development: We are committed to the growth and development of our team members, offering a range of programs including leadership and professional development workshops, stretch project assignments, and mentoring opportunities to help employees reach their full potential.
- Flexible work environment: [Remote options, hybrid schedules, flexible hours, etc.].
- Annual wellness and community outreach days
- Always on recognition for your contributions
- Global collaboration and networking opportunities