
Sr. Data Engineer
- Fountain Valley, CA
- $103,170-158,873 per year
- Permanent
- Full-time
- This job requires experience in building and maintaining scalable data pipelines and robust data models from structured and unstructured sources for AI/ML.
- The ideal candidate should have advanced SQL skills and be able to query and transform large structured/unstructured datasets using Spark/PySpark, Spark SQL/Hive and Hive/NoSQL.
- They should also have experience in developing Big Data pipelines in orchestration tools such as Airflow and Oozie, designing tooling for access management, monitoring, data controls, and self-service ETL/Analytics pipelines.
- Other requirements include hands-on experience with On-Prem Big Data Platform, sound knowledge of Distributed Data Processing frameworks, resource management frameworks like YARN, and proficiency in writing data pipelines using Spark, Python and Scala.
- The ideal candidate should also have experience in developing frameworks/utilities in Python, working in a Dev/Ops environment, and following development best practices such as code reviews and unit testing.
- Additionally, the candidate should be able to diagnose software issues and engineering workarounds, have a good understanding of BI tools such as Tableau/Power BI and MicroStrategy for Big Data, and be able to lead, guide and assist team members with project development and problem solving.
- The candidate should also be flexible and able to learn and use new technologies, work well in a team environment as well as independently to achieve goals.
- Bachelor’s degree OR equivalent (with major course work in computer science) preferred
- 8+ years of IT experience working with hands on working experience, in software development, building data pipelines and data processing frameworks.
- 4+ years of experience as a Data Engineer
- Big Data Technologies: Knowledge of Big Data technologies such as Hadoop, Spark, Hive, Pig, Kafka, and NoSQL databases such as MongoDB, Cassandra, and HBase.
- Distributed Systems: Understanding of distributed systems and distributed computing principles.
- Programming Languages: Proficiency in programming languages such as Java, Python, Scala, and SQL.
- Data Modeling: Knowledge of data modeling techniques and tools to design efficient data structures for Big Data systems.
- Data Processing: Experience with data processing and ETL (Extract, Transform, Load) tools and techniques.
- Cloud Computing: Familiarity with cloud computing platforms such as AWS, Azure, and Google Cloud.
- Data Security: Knowledge of data security principles and experience implementing security measures for Big Data systems.
- Data Warehousing: Understanding of data warehousing concepts and experience designing and maintaining data warehouses.
- Analytics and Machine Learning: Familiarity with analytics and machine learning tools and techniques and their implementation in Big Data systems.
- Performance Tuning: Experience with performance tuning and optimization techniques for Big Data systems to ensure scalability, reliability, and high availability.