Data & Reporting SRE Job in zilo

Data & Reporting Sre

London, ENG, GB, United Kingdom

Apply Now

Job Description

About:

Step forward into the future of technology with ZILO(TM).

We're here to redefine what's possible in technology. While we're trusted by the global Transfer Agency sector, our technology is truly flexible and designed to transform any business at scale. We've created a unified platform that adapts to diverse needs, offering the scalability and reliability legacy systems simply can't match.

At ZILO(TM), our DNA is built on Character, Creativity, and Craftsmanship. We face every challenge with integrity, explore new ideas with a curious mind, and set a high standard in every detail.

We are a team of dedicated professionals where everyone, regardless of their role, drives our progress and creates real impact. If you're ready to shape the future, let's talk.

Requirements

We are seeking an experienced Site Reliability Engineer (SRE) with deep subject-matter expertise in data processing and reporting. In this role, you will own the reliability, performance, and operational excellence of our real-time and batch data pipelines built on AWS, Apache Flink, Kafka, and Python.

You'll act as the first line of defense for data-related incidents

, rapidly diagnose root causes, and implement resilient solutions that keep critical reporting systems up and running.

Incident Management & Triage

Serve as on-call escalation for data pipeline incidents, including real-time stream failures and batch job errors. Rapidly analyze logs, metrics, and trace data to pinpoint failure points across AWS, Flink, Kafka, and Python layers. Lead post-incident reviews: identify root causes, document findings, and drive corrective actions to closure.

Reliability & Monitoring

Design, implement, and maintain robust

observability

for data pipelines: dashboards, alerts, distributed tracing. Define

SLOs/SLIs

for data freshness, throughput, and error rates; continuously monitor and optimize. Automate capacity planning, scaling policies, and disaster-recovery drills for stream and batch environments.

Architecture & Automation

Collaborate with data engineering and product teams to architect scalable, fault-tolerant pipelines using

AWS

services (e.g.,

Step Functions

EMR

Lambda

Redshift

) integrated with

Apache Flink

and

Kafka

. Troubleshoot & Maintain

Python

-based applications. Harden CI/CD for data jobs: implement automated testing of data schemas, versioned Flink jobs, and migration scripts.

Performance Optimization

Profile and tune streaming jobs: optimize checkpoint intervals, state backends, and parallelism settings in Flink. Analyze Kafka cluster health: tune broker configurations, partition strategies, and retention policies to meet SLAs. Leverage Python profiling and vectorized libraries to streamline batch analytics and report generation.

Collaboration & Knowledge Sharing

Act as SME for data & reporting stack: mentor peers, lead brown-bag sessions on best practices. Contribute to runbooks, design docs, and on-call playbooks detailing common failure modes and recovery steps. Work cross-functionally with DevOps, Security, and Product teams to align reliability goals and incident response workflows.

Benefits

Enhanced leave - 38 days inclusive of 8 UK Public Holidays Private Health Care including family cover Life Assurance - 5x salary Flexible working-work from home and/or in our London Office Employee Assistance Program Company Pension (Salary Sacrifice options available) Access to training and development Buy and Sell holiday scheme * The opportunity for "work from anywhere/global mobility"

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Related Jobs

Data & Reporting Specialist

Intrum

Reigate, ENG, GB

Apply Now
Data & Reporting Specialist

Intrum

Manchester, ENG, GB

Apply Now

Data Reporting Analyst

Computershare

Bristol, ENG, GB

Apply Now
Data Reporting Analyst (onsite)

ETB Technologies

Dalbeattie, SCT, GB

Apply Now

Job Detail

Job Id

JD3348970
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Full Time
Job Location

London, ENG, GB, United Kingdom
Education

Not mentioned

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers