
Software Engineer II-Big Data/Pyspark
- Plano, TX
- Permanent
- Full-time
- Design, develop, and maintain scalable data pipelines and ETL processes to support data integration and analytics. Implement ETL transformations on big data platforms, utilizing NoSQL databases like MongoDB, DynamoDB, and Cassandra.
- Utilize Python for data processing and transformation tasks, ensuring efficient and reliable data workflows. Work hands-on with SPARK to manage and process large datasets efficiently.
- Implement data orchestration and workflow automation using Apache Airflow. Apply understanding of Event-Driven Architecture (EDA) and Event Streaming, with exposure to Apache Kafka.
- Use Terraform for infrastructure provisioning and management, ensuring a robust and scalable data infrastructure. Deploy and manage containerized applications using Kubernetes (EKS) and Amazon ECS
- Implement AWS enterprise solutions, including Redshift, S3, EC2, Data Pipeline, and EMR, to enhance data processing capabilities.
- Develop and optimize data models to support business intelligence and analytics requirements. Work with graph databases to model and query complex relationships within data.
- Create and maintain interactive and insightful reports and dashboards using Tableau to support data-driven decision-making.
- Collaborate with cross-functional teams to understand data requirements and deliver solutions that meet business needs.
- Formal training or certification on software engineering concepts and 2+ years applied experience
- Strong programming skills in Python, with basic knowledge of Java
- Proficiency in data modeling techniques and best practices.
- Proficiency in understanding of graph databases and experience in modeling and querying graph data
- Experience in creating reports and dashboards using Tableau
- Hands-on experience with SPARK and managing large datasets.
- Experience in implementing ETL transformations on big data platforms, particularly with NoSQL databases (MongoDB, DynamoDB, Cassandra)
- Proficiency in understanding of Event-Driven Architecture (EDA) and Event Streaming, with exposure to Apache Kafka
- Strong analytical and problem-solving skills, with attention to detail
- Ability to work independently and collaboratively in a team environment
- Good communication skills, with the ability to convey technical concepts to non-technical stakeholders
- A proactive approach to learning and adapting to new technologies and methodologies
- Experience with Apache Airflow for data orchestration and workflow management
- Familiarity with container orchestration platforms such as Kubernetes (EKS) and Amazon ECS. Experience with Terraform for infrastructure as code and cloud resource management
- Familiarity with AWS enterprise implementations such as EMR/Glue, S3, EC2, Data Pipeline, Lambdas and IAM roles
eQuest