
DevOps Engineer (Mid-to-Senior Level)
- New York City, NY
- Permanent
- Full-time
- Architect and Maintain Cloud Infrastructure: Build, maintain, and scale our AWS cloud infrastructure using infrastructure-as-code and modern CI/CD pipelines (e.g. Argo Workflows). Ensure reliable, automated deployments of our applications and machine learning services across development, staging, and production environments.
- Container Orchestration: Manage our Kubernetes clusters and containerized microservices, optimizing for high availability, security, and efficient resource usage. Continuously improve our cluster deployment, scaling strategies, and rollback processes to support a rapidly growing platform.
- CI/CD & Automation: Design and implement continuous integration and delivery pipelines that empower our development team to ship code and ML model updates quickly and safely. Automate routine operations and workflows, reducing manual work through scripts, AWS Lambda functions, and other automation tools.
- Monitoring & Reliability: Implement robust monitoring, logging, and alerting (using tools like Prometheus, CloudWatch, etc.) to proactively track system performance and reliability. Quickly troubleshoot and resolve infrastructure issues or bottlenecks across the stack to maintain high uptime and responsive services.
- Data & Pipeline Integration: Work closely with our data engineering team to support a seamless flow of data through the platform. Maintain and optimize our event streaming and pipeline architecture (Kafka) and its integration with downstream systems like our Snowflake data warehouse and Looker analytics, ensuring data is delivered accurately and on time.
- AI/ML Infrastructure: Collaborate with machine learning engineers to deploy and scale AI/ML models in production. Support the integration of OpenAI and other ML models into our applications, implementing the infrastructure (compute, storage, containers) needed for model training, inference, and monitoring model performance in a live environment.
- Tool Integration & Support: Integrate and manage internal and third-party tools that extend our platform’s functionality – for example, maintaining our Hasura GraphQL engine that interfaces with databases, or automating workflows involving external services like Airtable. Ensure these tools are properly deployed, updated, and aligned with our security and compliance standards.
- DevOps Best Practices & Culture: Champion DevOps best practices across the engineering organization. This includes improving our release processes (e.g. implementing GitOps workflows), optimizing build/test pipelines, and mentoring developers on using infrastructure tools. You will continually evaluate new technologies and processes to enhance deployment speed, reliability, and scalability, while balancing rapid iteration with operational stability.
- Experience: 5+ years of experience in DevOps, SRE, or related infrastructure engineering roles, with a track record of managing complex, distributed systems at scale.
- Cloud Proficiency: Strong expertise in AWS and cloud architecture (compute, storage, networking, and security). You have designed and maintained scalable infrastructure using services like EC2/ECS/EKS, S3, RDS, VPC, and Lambda, and you understand how to build secure and cost-efficient cloud environments.
- Containers & Orchestration: Hands-on experience with containerization and orchestration – you have managed production Kubernetes clusters (or similar orchestration platforms), and you’re comfortable with Docker and container lifecycle management.
- CI/CD & Automation: Proven ability to create and manage CI/CD pipelines using tools such as Jenkins, CircleCI, GitHub Actions, or Argo. You automate workflows wherever possible and have experience implementing GitOps or similar practices to streamline deployments.
- Infrastructure as Code: Proficiency in scripting and infrastructure-as-code (Terraform, CloudFormation, or equivalent). You can manage infrastructure configuration in a reproducible way and have experience automating cloud resource provisioning.
- Monitoring & Troubleshooting: Solid knowledge of monitoring and logging frameworks (e.g. Prometheus, Grafana, ELK stack, CloudWatch) and experience setting up alerts and dashboards. You excel at diagnosing issues across the full stack – from network and infrastructure to application logs – and ensuring high reliability.
- Data Pipeline Familiarity: Familiarity with event-driven architecture and data pipelines. You have worked with messaging or streaming systems (e.g. Kafka, Kinesis) and understand how to connect various data stores and services (relational and NoSQL databases, data warehouses like Snowflake) in a production environment.
- Security Mindset: Good understanding of security best practices in cloud and DevOps (managing secrets, IAM roles, VPC security, etc.). You are vigilant about maintaining compliance and protecting sensitive data across all systems.
- Collaboration & Communication: Excellent communication skills and a collaborative attitude. You can work effectively on a remote, cross-functional team, partnering with software engineers, data scientists, product managers, and QA to achieve common goals.
- Adaptability: Self-driven and adaptable to change. You thrive in fast-paced, ambiguous environments and take ownership of delivering results. You prefer simple, elegant solutions and have a knack for prioritizing what will scale and add value, in line with our mission to deliver results and delight our users.
- Startup / 0→1 Experience: Experience working in a startup or building systems from scratch. You’re comfortable with the scrappiness and ingenuity required to design new infrastructure and processes in a rapidly evolving environment.
- MLOps & AI Services: Exposure to MLOps or AI-driven platforms. Experience deploying or managing machine learning models in production, or familiarity with ML frameworks and services (e.g. handling model serving, working with OpenAI or similar AI APIs) is a strong plus.
- Data & Analytics Tools: Experience with data warehousing and analytics tools – for example, deploying or maintaining Snowflake, or integrating BI platforms like Looker into a data pipeline. Understanding of how to optimize data flows and query performance in such systems is a plus.
- GraphQL / Hasura: Familiarity with GraphQL APIs and frameworks (especially Hasura). You understand how GraphQL layers interface with backend databases and can optimize or troubleshoot in such an environment.
- Orchestration & Serverless: Experience with workflow orchestration tools like Argo Workflows (or similar, e.g. Airflow, Tekton) for running complex jobs/pipelines. Experience managing serverless functions (AWS Lambda) as part of a larger system is also beneficial.
- Domain Interest: A passion for our mission of sustainability and transforming the fashion industry. Interest or experience in e-commerce, manufacturing processes, or fashion technology is a plus – you enjoy applying technology to solve real-world problems in new domains.
- Compensation & Benefits: We offer full benefits (medical, dental, and vision) and a competitive salary, along with equity participation. You’ll be joining a passionate team with a shared mission and ample opportunities for growth.
- Remote Work: This is a fully remote position. We embrace a remote-first culture that allows you to work from anywhere, while staying closely connected with a diverse, global team. (Periodic travel to our NYC or Dominican Republic hubs for team gatherings is optional/occasional.*)
- Mission-Driven Culture: Work on something meaningful – every feature you help ship and every system you optimize contributes to eliminating waste in the fashion industry and driving sustainable innovation. We foster a creative, inclusive environment where new ideas are encouraged.
- Equal Opportunity Employer: Resonance Companies is an equal opportunity employer and values diversity in our company. We do not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other status protected by applicable law. All employment decisions are based on qualifications, merit, and business need.