Data Lake Engineer

Remote, GB, United Kingdom

Job Description

Data Lake Engineer



Why Square Health



Square Health is one of the UK's leading digital health providers, powering wellbeing and medical services for major insurers, employers, and healthcare organisations.

Technology sits at the heart of everything we do from our flagship Clinic-in-a-Pocket mobile app to our expanding cloud-native SaaS and API platform, enabling personalised, data-driven healthcare journeys at scale.

As a Data Lake Engineer, you'll help build and evolve the data foundations that underpin our analytics, and operational insight capabilities. You will work across AWS native services, automated pipelines, and SDLC driven delivery to ensure data is reliable, secure, and ready to power the next generation of digital health experiences.

What you will be doing:



Designing, building, and maintaining Square Health's AWS Data Lake architecture across S3, Glue, AppFlow, Lake Formation, and related services. Developing ETL/ELT pipelines using AWS Glue (PySpark, Spark), Python, and modern data engineering best practices. Implementing data ingestion flows using Amazon AppFlow and integrating with sources such as Salesforce, GP Portals, and internal microservices. Managing schemas, tables, and permission models using AWS Glue Data Catalog and Lake Formation, ensuring security and governance controls are applied correctly. Working closely with platform engineering to ensure pipelines follow SDLC. Creating reusable, automated data workflows (CI/CD) using Infrastructure-as-Code (IaC). Designing storage layers (Bronze ? Silver ? Gold) following best practices for reliability, cost, and performance. Troubleshooting and optimising data jobs for performance, scalability, and error resilience. Collaborating with Analytics, MI, Product, and Engineering teams to deliver clean, well-governed datasets for reporting and AI/ML initiatives. Ensuring pipelines meet regulatory needs in a regulated health environment (data governance, access control, encryption, audit). Upskilling team members to ensure proficiency in AWS Data Lake technologies, ETL/ELT development, Python, PySpark, and CI/CD workflows, fostering a culture of continuous learning and adaptability.

You will be a great fit if you...



Have strong experience engineering cloud-native Data Lakes using AWS services. Are highly proficient with PySpark, Spark, and Python for data transformation. Have hands-on experience with AWS Glue, AppFlow, Lake Formation, and S3-based data architectures. Understand data modelling, partitioning strategies, and schema evolution for large-scale data pipelines. Come from a background where automation is the default CI/CD, IaC, automated deployments, minimal manual admin. Enjoy working in environments with strong SDLC discipline, version control, peer reviews, and testable code. Take pride in building clean, well-structured, repeatable pipelines. Have experience handling complex data integration from multiple systems (APIs, CRMs, event streams, operational systems). Are comfortable working in fast-moving product teams and collaborating with engineers, product managers, and analysts. Have the right attitude--proactive, solution-focused, and comfortable in a growing scale-up pushing towards modern platform automation.

Essential Skills:



Strong commercial experience with AWS Data Lake architectures. AWS Glue (ETL jobs, PySpark, Crawlers, Data Catalog). Python for ETL, utilities, automation, and orchestration. Amazon AppFlow configuration, mapping, scheduling, and troubleshooting. Lake Formation permissions, governance, Lake Formation LF-Tags. S3 data modelling and lifecycle management (Bronze / Silver / Gold). Experience with CI/CD tools (GitHub Actions, GitLab CI, Azure DevOps or similar). Familiarity with Infrastructure as Code (CloudFormation). Understanding of data quality frameworks, schema validation, logging, and monitoring.

Nice to Have:



Experience in healthcare or regulated environments (GDPR, PHI handling, security controls). Familiarity with Athena, EMR, Redshift, or SageMaker Feature Store. Exposure to microservices-based data ingestion or event-driven architectures. Experience building MI/Reporting data models (Star/Snowflake schemas). Knowledge of API integration patterns and metadata-driven ETL.

What We Offer:



Competitive Salary 25 days holiday Death in Service Scheme Access to Aviva Smart Health App Pension scheme with Scottish Widows A fast-paced, supportive environment where you can shape a next-generation health data platform Opportunities for progression and personal development as the platform scales
If this exciting opportunity is of interest to you and you would like to find out more, then please apply today!

Job Types: Full-time, Permanent

Work Location: Remote

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD4428818
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Full Time
  • Job Location
    Remote, GB, United Kingdom
  • Education
    Not mentioned