
Principal Data Engineer
- Irvine, CA
- $126,500-241,700 per year
- Permanent
- Full-time
- Architect and implement scalable data pipelines using Apache Spark, Databricks, and Azure Data Factory.
- Lead the development of real-time streaming solutions using Apache Kafka.
- Design and optimize ETL/ELT workflows for structured and unstructured data.
- Build and maintain distributed data systems using Cassandra, Delta Lake, and other modern data stores.
- Utilize Delta Live Tables (DLT) to create reliable, maintainable, and testable batch and streaming pipelines.
- Integrate Databricks with Azure Machine Learning, Azure Synapse, and other cloud services.
- Implement CI/CD pipelines using Azure DevOps, Terraform.
- Collaborate with data scientists to deploy and manage ML models using MLflow.
- Ensure data quality, governance, and security across all engineering efforts.
- Troubleshoot and resolve issues in data models, workflows, and infrastructure.
- Design and maintain data models for cloud data warehouses such as Snowflake or Databricks.
- Apply advanced techniques like data partitioning, indexing, and compression to optimize performance and storage.
- Develop disaster recovery plans and backup strategies to ensure business continuity.
- Mentor junior engineers and foster a culture of technical excellence and innovation.
- Stay current with emerging technologies and recommend strategic adoption where appropriate.
- Collaborate with global Agile teams to deliver high-quality solutions.
- Bachelor's degree and 12+ years of experience in data engineering, with at least 3 years in a principal or lead role.
- Expertise in Azure, Databricks, Apache Spark, Kafka, and Cassandra.
- Strong programming skills in Python, SQL, and Scala.
- Experience with distributed systems, data modeling, and data warehousing.
- Familiarity with machine learning pipelines, MLOps, and cloud-native architectures.
- Proven ability to lead cross-functional teams and deliver complex data solutions.
- Excellent communication, problem-solving, and leadership skills.
- Exposure to big data tools and distributed computing.
- Certifications in Azure or Databricks are a plus.