Specialist - Digital Product Data Team
Orange & Rockland
- New York City, NY
- Permanent
- Full-time
- Con Edison, a multibillion-dollar energy utility, is leading the clean energy transition in New York City and Westchester County. We are investing hundreds of millions of dollars annually to integrate distributed resources into our electric and gas systems, support electric vehicle charging infrastructure, and scale adoption of energy efficiency and clean heating technologies across our customers.
- We are seeking an experienced Databricks Developer to join our data engineering team. In this role, you will be responsible for designing, building, and maintaining scalable data pipelines and solutions on the Azure Databricks platform. You will work closely with data engineers, data scientists, and analysts to understand business requirements and develop efficient and reliable data processing systems.
- Utilize Azure Databricks platform to architect and construct robust data pipelines for big data solutions.
- Design and implement data ingestion pipelines to extract data from various sources (databases, data lakes, APIs, etc.)
- Perform data transformation and cleansing operations using Databricks Spark clusters.
- Develop and optimize ETL/ELT processes to load data into data warehouses or data lakes.
- Create and maintain data models (e.g., star schema, data vault) for analytical workloads.
- Develop Spark-based applications for batch and real-time data processing.
- Implement machine learning models and deploy them on Databricks for predictive analytics.
- Write clean, maintainable, and scalable Scala, Python, or SQL code for data engineering tasks.
- Collaborate with cross-functional teams, including data engineers, analysts, and scientists.
- Participate in code reviews and follow best practices for version control and documentation.
- Set up CI/CD pipelines for automated testing, building, and deployment of Databricks jobs and applications.
- Implement data security measures, such as access control, encryption, and auditing.
- Ensure compliance with data governance policies and regulatory requirements.
- Manage data lineage and metadata to enhance data governance practices.
- Strong problem-solving skills, attention to detail, and a deep understanding of big data technologies and cloud architectures are essential for success in this role.
- Develop and maintain dimensional data models and star schemas.
- Manage data storage, partitioning, and organization in data lakes.
- Bachelor's Degree and minimum of 2 years relevant experience or
- Master's Degree in Computer Science, Information Technology, Engineering, Math, Business, or applicable degree and 3 years related work experience in a system engineering/analysis area.
- Proficient in Azure Databricks, Spark (Scala, PySpark), SQL, Delta Lake, and Structured Streaming. Required
- Skilled in crafting Data Pipelines, ETL/ELT processes, Data Ingestion, Transformation, and Data Warehousing. Required
- Experienced with Azure Cloud Services such as Data Factory, Synapse Analytics, and Data Lake Storage. Required
- Competent in programming languages including Python, Scala, and SQL. Required
- Proficient in Data Modeling techniques such as Star Schema and Dimensional Modeling, with a focus on Data Governance principles. Required
- Well-versed in CI/CD practices, Azure DevOps, and Agile Methodologies. Required
- Developed and optimized ETL/ELT processes using Databricks Notebooks, PySpark, and Delta Lake, ensuring streamlined data processing and transformation. Required
- Implemented robust data quality checks, error handling, and validation mechanisms to uphold data integrity and reliability standards. Required
- Databricks Certified Data Engineer Associate Certification preferred.
- Driver's License Required
- Must be able to respond to Company emergencies by performing a System Emergency Assignment to restore service to our customers.