Overview:
We are looking for a data engineer experienced in DevOps-based pipeline delivery, who can not only develop the pipeline but also establish the foundational framework for reusable data ingestion processes. The ideal candidate is proactive, a self-starter, and demonstrates a strong can-do attitude.
While not essential, experience with Health Data systems would be highly advantageous.
Responsibilities:
Ingestion Framework Delivery: Responsible for building reusable metadata driven data pipelines within a framework to handle batch and near-real-time data feeds.
Data Pipeline Development: Develop end-to-end data pipelines, including data load patterns, error handling, automation, and hardware optimisation.
Requirements Formulation: Collaborate with Business Analysts, Architects, SMEs, and business teams to define requirements and implement solutions using modern EDW cloud tools and best practices.
Detailed Solution Design: Work with architects and analysts to create detailed solution designs for data pipelines, ensuring adherence to policies, rules, and standards.
Promote DevOps best practices for iterative solution delivery including CI/CD, version control, monitoring and alerting, automated testing and IaC.
Data Modelling and Warehousing: Build and optimise pipelines to populate data stores such as DWH, Lakehouse, and other repositories, following industry and clinical standards like openEHR, FHIR and OMOP.
Data Quality & Governance: Apply data quality checks, validation rules, and governance policies to maintain the accuracy, completeness, and reliability of clinical data, and address any data discrepancies.
Data Integration: Integrate various clinical datasets, ensuring proper mapping, standardisation, and harmonisation across systems and terminologies (e.g., SNOMED CT, LOINC, ICD-10).
Performance Optimisation: Monitor and enhance the performance of data pipelines, warehouses, and queries for efficient data processing.
Operational Controls: Apply operational procedures, security practices, and production policies to ensure high-quality service delivery.
Collaboration: Work with clinical stakeholders, data scientists, analysts, and other professionals to define data requirements and deliver technical solutions. Lead showcase sessions after each delivery.
Documentation: Maintain comprehensive technical documentation for data architectures, pipelines, models, metadata, and processes.
Troubleshooting & Support: Provide technical support and resolve issues related to data pipelines and data quality.
Innovation & Best Practices: Stay updated on new data engineering technologies and best practices, especially in healthcare, and recommend adoption as needed.
Lead proof of concepts, pilots, and develop data pipeline using agile and iterative methods.
Qualifications:
Certifications such as DP 203 and AZ 900 or similar certification/experience.
Essential skills:
Experience in working with healthcare data to build a healthcare data store would be a significant plus. Which should include standards and interoperability protocols (e.g., openEHR, FHIR, HL7, DICOM, CDA).
Experience in converting one healthcare data to develop an ODS or DWH.
Experience in integrating data analytical outcomes and key information into clinical workflows.
Desired skills:
Knowledge of streaming data architectures and technologies (e.g., Kafka, Azure Event Hubs, Kinesis).
Knowledge of handing Genome datasets (FASTQ, VCF etc.) and document formats (IHE).
General experience working with Gen AI including LLM generated clinical data/summaries
Experience:
Extensive background as a data engineer, specialising in data warehouse environments and building various types of data pipelines.
Demonstrated ability to design and implement data integration and conversion pipelines using ETL/ELT tools, accelerators, and frameworks such as Azure Data Factory, Azure Synapse, Snowflake (Cloud), SSIS, or custom scripts.
Skilled in developing reusable ETL frameworks for data processing.
Proficient in at least one programming language commonly used for data manipulation and scripting, including Python, PySpark, Java, or Scala.
Strong understanding and hands-on experience with DevOps practices and tools, especially Azure DevOps for CI/CD, Git for version control, and Infrastructure as Code.
Advanced SQL skills and experience working with relational databases like PostgreSQL, SQL Server, Oracle, and MySQL.
Experience implementing solutions on cloud-based data platforms such as Azure, Snowflake, Google Cloud, and related accelerators.
Experience with developing and deploying containerised microservices architectures.
Understanding of data modelling techniques, including star schema, snowflake schema, and third normal form (3NF).
Proven track record with DevOps methodologies in data engineering, including CI/CD and Infrastructure as Code.
Knowledge of data governance, data quality, and data security in regulated environments.
Experience mapping data from unstructured, semi-structured, and proprietary structured formats within clinical data stores.
Strong interpersonal, communication, and documentation abilities, enabling effective collaboration between clinical and technical teams.
Experience working in Agile development settings.
Outstanding analytical, problem-solving, and communication skills.
Ability to work independently as well as collaboratively within a team.
Benefits:
Collaborative working environment - we stand shoulder to shoulder with our clients and our peers through good times and challenges
We empower all passionate technology loving professionals by allowing them to expand their skills and take part in inspiring projects
Expleo Academy - enables you to acquire and develop the right skills by delivering a suite of accredited training courses
Competitive company benefits
Always working as one team, our people are not afraid to think big and challenge the status quo
MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.