Overview:We are seeking a highly skilled Datadog Subject Matter Expert (SME) to lead the design, implementation, optimization, and ongoing management of monitoring, observability, and alerting solutions across our technology landscape. The SME will act as a trusted advisor and technical authority, enabling teams to maximize the value of Datadog for performance monitoring, application observability, infrastructure health, security insights, and operational excellence.Key Responsibilities:
Design & Architecture
Develop scalable observability strategies using Datadog for applications, infrastructure, cloud services, and security monitoring.
Architect dashboards, monitors, and alerting frameworks tailored to business and operational requirements.
Implementation & Integration
Lead deployment and configuration of Datadog agents, integrations, and APIs across hybrid/multi-cloud environments (AWS, Azure, GCP, on-prem).
Integrate Datadog with CI/CD pipelines, logging systems, and collaboration tools (e.g., Slack, ServiceNow, Jira).
Optimization & Governance
Establish best practices for metric collection, log ingestion, tracing, and anomaly detection.
Optimize cost management and usage efficiency of Datadog licenses and features.
Ensure alerting policies reduce noise while providing actionable insights.
Collaboration & Enablement
Partner with DevOps, SRE, Cloud, Security, and Application teams to embed observability into daily operations.
Conduct training, documentation, and workshops to upskill engineering teams in Datadog usage.
Troubleshooting & Support
Serve as escalation point for Datadog-related performance or monitoring issues.
Perform root-cause analysis using Datadog dashboards, traces, and logs to identify and resolve system issues quickly.
Qualifications:
Proven experience as a Datadog SME, Consultant, or Senior Engineer with hands-on deployment and scaling expertise.
Strong background in observability, monitoring, and APM practices across distributed systems and microservices.