
Senior Network Operations Engineer - Layer 4-7 - Federal
- Santa Clara, CA
- Permanent
- Full-time
- Operate and maintain ServiceNow’s global cloud network infrastructure, including backbone routing, top-of-rack (TOR) switching, VPN services, and application delivery controller (ADC) systems.
- Troubleshoot and resolve network issues, including urgent operational events.
- Participate in 24/7 on-call rotation, including weekends, as part of the Network Operations Engineering team.
- Maintain software-defined, declarative infrastructure at scale using automation tools such as Ansible, GitLab.
- Perform software upgrades, version control, and security patching across production systems.
- Proactively analyze network metrics such as capacity, latency, and availability to detect and prevent outages.
- Support network operations in private and hybrid multi-cloud environments (e.g., Azure, AWS, GCP).
- Partner with the Site Reliability Engineering (SRE) team to improve operational processes and reliability.
- Review, consult, and prepare for planned changes and releases to the production environment.
- Create and maintain detailed documentation of infrastructure, automation, and standard operating procedures.
- Provide feedback to infrastructure architects and contribute to design discussions for new initiatives.
- Collaborate with peer teams building world-class networking and orchestration solutions.
- Evaluate, adopt, and implement new open-source and commercial tools and technologies.
- Contribute to processes and automation to build a low-touch, continuous deployment pipeline with near-zero downtime and high success rates.
- Drive automation to enable rapid deployment and updates across large-scale environments.
- Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry
- 4+ years of experience in network operations, infrastructure engineering, or a similar role supporting large-scale distributed systems.
- Strong hands-on experience with load balancers (e.g., F5, NGINX), routing/switching (e.g., Juniper, Cisco), and security devices (e.g., Palo Alto, Radware) in production environments.
- Solid understanding of network protocols and services, including TCP/IP, BGP, DNS, TLS/mTLS, and VPNs.
- Experience managing hybrid and public cloud environments (AWS, GCP, Azure) in an operational capacity.
- Proficient in Linux systems administration and troubleshooting.
- Familiarity with container technologies (e.g., Docker, Kubernetes) and service mesh architectures.
- Experience with monitoring, observability, and alerting tools (e.g., Prometheus, Grafana, Splunk).
- Ability to respond to incident resolution, including root cause analysis and post-mortems.
- Proficiency in infrastructure-as-code and automation tools, such as Ansible, Terraform, GitLab CI/CD.
- Scripting skills in Python, Bash, or similar languages for automation and tooling.
- Experience with change management processes in high-availability production environments.
- Excellent problem-solving skills and attention to detail, with a bias toward action and automation.
- Effective communication and collaboration skills, including cross-functional team engagement.
- Willingness to participate in a 24/7 on-call rotation, including weekends.