AWS Data Engineer (PySpark/Sagemaker)
Innovim Technology Solutions
- New York City, NY
- Permanent
- Full-time
- Design, develop, and deploy PySpark ETL pipelines to migrate and transform actuarial data.
- Build data ingestion pipelines from multiple source systems to Redshift using AWS Glue, DMS, and Step Functions.
- Optimize performance using Redshift Spectrum for external table queries.
- Automate data workflows using AWS Lambda, Step Functions, and Stonebranch job schedulers.
- Implement and maintain CI/CD pipelines for deploying data applications and monitoring pipeline health.
- Collaborate closely with actuarial teams, data analysts, and other engineers in an Agile environment.
- Participate in sprint planning, story refinement, and backlog grooming using JIRA.
- 5+ years of experience in building ETL/ELT pipelines using PySpark.
- Proven experience with AWS Redshift and Redshift Spectrum.
- Strong SQL skills for data extraction, transformation, validation, and performance tuning.
- Hands-on experience with:
- AWS Glue for data cataloging and ETL orchestration.
- AWS Data Migration Service (DMS) for legacy-to-cloud migration.
- AWS Step Functions and Lambda for orchestrating data workflows.
- Proficiency in CI/CD pipelines, especially with tools like Jenkins, GitLab CI.
- Experience with Stonebranch for job scheduling and monitoring.
- Understanding of data governance, quality frameworks, and security best practices.
- Experience with AWS SageMaker or similar ML platforms for model deployment or integration.
- Experience with AWS DataBrew or similar data wrangling tools
- Previous work experience in actuarial, insurance, or financial services domains.
- Working knowledge of Agile/Scrum methodology and familiarity with JIRA or similar tools.
- Excellent communication skills with the ability to explain complex technical concepts to non-technical stakeholders.