Data Engineer

Programmers.io

  • USA
  • Permanent
  • Full-time
  • 12 hours ago
  • Apply easily
Title: Data EngineerLocation: Fully remote in USADuration: Long term contractJob Summary:We are seeking a highly skilled Data Engineer to design, develop, and maintain robust data pipelines and infrastructure to support analytics, reporting, and data science initiatives. The ideal candidate will have experience with ETL processes, SQL, data modeling, big data technologies (e.g., Spark, Hadoop), cloud data platforms (AWS, GCP), and pipeline orchestration tools like Apache Airflow.Key Responsibilities:Design, develop, and optimize ETL/ELT pipelines to ingest, transform, and load structured and unstructured data from diverse sources.Build scalable and reliable data pipelines using technologies such as Apache Spark, Apache Hadoop, and Airflow.Create and maintain data models (star/snowflake schema, normalized/denormalized) to support business intelligence and data science applications.Work with SQL and distributed query engines (e.g., Presto, Hive, BigQuery, Redshift) to process and query large datasets efficiently.Collaborate with cross-functional teams (data analysts, scientists, and engineers) to define and implement data requirements and best practices.Ensure high levels of data integrity, accuracy, and availability across all data systems.Manage and optimize data infrastructure on cloud platforms such as AWS or Google Cloud Platform (GCP).Implement data governance, privacy, and security best practices.Monitor and troubleshoot data workflows, addressing issues proactively to minimize data downtime.Qualifications:Bachelor's or Master's degree in Computer Science, Engineering, Information Systems, or a related field.7+ years of experience as a Data Engineer or similar role.Strong expertise in SQL and experience working with relational and NoSQL databases.Hands-on experience with ETL frameworks and building data pipelines.Proficiency in Big Data tools like Apache Spark, Hadoop, Hive, or Presto.Experience with cloud data platforms such as Amazon Web Services (AWS) or Google Cloud Platform (GCP) - e.g., S3, Redshift, BigQuery, EMR, Dataflow.Familiarity with workflow orchestration tools like Apache Airflow, Luigi, or similar.Solid understanding of data modeling principles and practices.Proficiency in Python, Scala, or Java for data pipeline development.Preferred:Experience with data cataloging, lineage, and metadata tools.Exposure to DevOps tools and CI/CD practices for data pipelines.Experience working in Agile environments and using tools like Jira and Git.

Programmers.io