
BizOps Senior Engineer
- O'Fallon, MO
- $94,000-157,000 per year
- Permanent
- Full-time
TAbout the RoleTeam Specific Skills:
It is not expected that any single candidate would have expertise across all these areas, but a Biz Ops engineer will spend a bit of time throughout their career with all of these aspects of the role:
- Operational Readiness Architect:
o Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
o Partner with the development and product team of a new application to establish the right monitoring and alerting strategy and create the framework to achieve zero downtime during deployment.
- Site Reliability Engineering:
o Perform root cause analysis of incidents and collaborate with development teams to resolve issues.
o Stay up to date with the latest technologies and trends in SRE and cloud computing.
o Participate in on-call rotations and be available to respond to critical incidents.
o Complete end-to-end run ownership of the product.
o Practice sustainable incident response and blameless post-mortems while taking a holistic approach to problem solving and optimizing time to recover.
o Automate data-driven alerts to proactively escalate issues. Work with development teams to establish SLOs and improve reliability.
- DevOps/Automation:
o Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices.
o Performs operational and resilience Design and implements solutions for capacity planning and performance optimization.
o Increase automation and tooling to reduce toil and manual intervention
- ITSM Practices:
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
- Ability to read, write, and understand code in one of the programming languages .
- Strong understanding of DevOps principles, practices along with configuration management.
- Experience in operational and resilience designing, building, and operating large-scale, distributed systems.
- Appetite for change and pushing the boundaries of what can be done with automation. Be curious about new technology, infrastructure, and practices to scale our architecture and prepare for future growth.
- Experience with algorithms, data structures, scripting, pipeline management, and software design.
- Systematic problem-solving approach, analytical, coupled with strong communication skills and a sense of ownership and drive.
- Interest in designing, analysing, and troubleshooting large-scale distributed systems.
- Strong leadership and mentoring skills.
- A passion for observability, automation and continuous improvement.
- Willingness and ability to learn and take on challenging opportunities and to work as a member of matrix based diverse and geographically distributed project team.
- Ability to balance doing things right with fixing things quickly. Flexible and pragmatic, while working towards improving the long-term health of the system.
- Comfortable collaborating with cross-functional teams to ensure that expected system behaviour is understood and monitoring exists to detect anomalies.
- Expert coding experience in one or more of the following: o (Options): C++, Java, Spring Framework, Python, Go, Spark, Bigdata, GRPC.
- Experience with algorithms, data structures, scripting, pipeline management, and software design. • Ability to read, write, and understand code in one of the programming languages such as Java, Spring Framework, Python, Go.
- Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
- Familiarity with cloud platforms like AWS, Azure, or GCP (a plus).
- Experience with Message Queue (MQ) technologies like RabbitMQ, Event Broker, Kafka, or ActiveMQ.
- Background on cloud native tooling and orchestration technologies (Kubernetes preferred).
- Experience in observability tools such as Splunk, Dynatrace, Prometheus, Datadog, Grafana, and Monitoring as a Code.
- Experience in production support environments and ITIL processes.
- Experience with industry standard CI/CD tools like Git/BitBucket, Jenkins, Maven, Artifactory, Groovy and Chef. Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is required.
- Understanding of:
o Network concepts (Layer 1 to Layer 3)
o Stack trace analysis (TCP dumps, heap dumps, CPU/memory analysis, thread dumps).
o Load balancers and application firewalls. o Operating System navigation.
o Logging and monitoring methods, standards, and tools. o High availability and business continuity planning
o Caching concepts
o Configuration managementGreat to have (For L7 and above):
- Hands-on experience in Modernization through the adoption of Kubernetes and containerization technologies like Docker and Azure Container Registry.
- Ability to speak about Kubernetes from different perspectives: Comfortable handling Kubernetes discussions at the technical, business, or financial level.
- Strategizing, designing, and supporting highly efficient solutions on Public Cloud (Amazon Web Services, Azure or GCP) for security, resilience, performant, networking, availability, Blue-green deployments in context of business application.
- Azure DevOps (AZ - 400), Azure Cloud Developer (AZ-203) certificate is preferred.
- Hands-on expertise in diverse DevSecOps concepts/tools, especially on Azure DevOps, Pipelines, GitHub, GitHub actions.
- Knowledge of emerging technologies, various platforms, tools and products and their respective applications.
- Awareness of security implementations, certificate management lifecycle, mutual TLS, SSL handshake, SSH keys, symmetric and asymmetric encryptions