
Site Reliability Engineer, Distribution Engineering
- Stamford, CT
- $110,000-145,000 per year
- Permanent
- Full-time
- Develop automation to deploy, maintain, and monitor infrastructure and applications.
- Troubleshoot and resolve issues in live, on-air environments.
- Participate in CI/CD pipelines, including code deployment, testing, and monitoring.
- Create and maintain system metrics, dashboards, and alerting to ensure high availability.
- Collaborate with engineering, operations, and vendor teams to support system health and performance.
- Act as a Level 2 support resource for broadcast-related incidents, including root cause analysis and documentation.
- Participate in on-call rotation for 24/7 support coverage.
- Evaluate new technologies and contribute to proof-of-concept deployments.
- Document system configurations, incident resolutions, and operational procedures.
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
- 3+ years of SRE experience in the technology sector supporting and maintaining production-quality software or software-defined infrastructure in a high traffic environment run in cloud environments (AWS preferred)
- Experience with IP video and broadcast technologies.
- Proficiency in Linux system administration.
- Experience with Infrastructure as Code (Terraform or CloudFormation) and configuration management technologies (Ansible).
- Familiarity with CI/CD tools (e.g., GitHub Actions, Jenkins, ArgoCD).
- Experience with containerization and orchestration (Docker, Kubernetes, EKS).
- Scripting experience (Python, Bash, or similar).
- Strong understanding of networking fundamentals and troubleshooting.
- Experience with monitoring/logging tools (e.g., Grafana, Splunk, ELK, CloudWatch).
- Comfortable working in agile, fast-paced environments.
- Experience maintaining both Linux and Windows environments
- Familiarity with broadcast and monitoring tools such as Dataminer, TAG systems, and/or MediaProxy
- Strong hands-on experience debugging and troubleshooting distributed microservices in Kubernetes, including analyzing pod logs
- Solid understanding of networking concepts relevant to video streaming, including multicast, unicast, RTP/RTMP, and CDN workflows
- Ability to take ownership of problems and drive solutions through automation where applicable (Automation-first mentality)
- Experience with live TV broadcasting, OTT streaming, and video/audio codecs.
- Familiarity with ARQ technologies and cloud-based video distribution.
- Experience supporting 24x7 production environments and customer-facing systems.
- Use of AI/ML for data analysis or operational insights.
- Deep experience with monitoring and alerting tools (Grafana, Splunk, ELK Stack).
- Ability to build end-to-end dashboards and alerts for enterprise systems.
- Experience with frontend technologies (React, NodeJS, Typescript) and UI design.
- Familiarity with SMPTE standards and PTP implementation.
- Experience deploying and supporting playout systems in cloud and hybrid environments.
- Monitoring Tools: Grafana (loki), Splunk
- Familiarity with broadcast automation and IP video distribution workflows.
- Experience evaluating software releases for reliability and integration.
- Strong design and problem-solving skills in broadcast infrastructure.