Site Reliability Engineer

Hove, ENG, GB, United Kingdom

Job Description

We are seeking a highly skilled

Site Reliability Engineer (SRE)

to lead modernization initiatives across IT operations by establishing robust observability practices and automating manual processes (toil). The ideal candidate will combine strategic thinking with deep hands-on expertise to drive reliability, scalability, and efficiency across complex technology landscapes. This role requires strong leadership, advanced technical proficiency, and the ability to foster a culture of reliability and continuous improvement.

Primary Responsibilities



Operational Modernization & Strategy



Collaborate with product engineering teams to define and implement strategies that modernize IT operations, enhance observability, and reduce toil. Architect, deploy, and optimize observability platforms to monitor system health, performance, and reliability. Define and drive strategies for AI-driven alerting, proactive anomaly detection, and event correlation to reduce MTTD and MTTR. Develop and implement SRE practices including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budget policies. Create and maintain an AIOps roadmap to improve operational efficiency and accelerate automation initiatives.

Automation & Reliability Engineering



Automate repetitive processes using scripting, orchestration tools, and AI/ML-driven automation models. Drive initiatives for automated incident response, self-healing workflows, and autonomous operations. Enable shift-left engineering practices by partnering with engineering, architecture, and product teams to improve system reliability early in the development lifecycle. Lead continuous improvement initiatives focusing on reducing operational burden and improving resilience across systems and services.

Incident Management & Root Cause Analysis



Oversee and enhance incident management processes through automation and structured problem-solving. Conduct root cause analyses and drive remediation efforts to prevent recurrence and strengthen system reliability.

Collaboration & Leadership



Work cross-functionally to ensure systems are built to be scalable, resilient, and maintainable. Mentor teams in adopting SRE principles, tools, and modern operational practices. Champion a culture of automation, observability, and reliability across the organization.

Key Skills & Technical Expertise



Core Competencies



Strong proficiency in applying SRE principles across large-scale environments. Advanced hands-on experience with observability tools, specifically

Dynatrace

and

Datadog

. Expertise in automation and scripting using

Python

and

Ansible

. Robust experience with cloud platforms including

AWS

and

Azure

. Deep understanding of containerization and orchestration using

Docker

and

Kubernetes

. Strong knowledge of cloud-native architectures and distributed systems. Exposure to AI/ML-driven predictive analytics, anomaly detection, and automated remediation. Familiarity with CI/CD pipelines and automated release and deployment practices.

Desirable Skills



Experience with chaos engineering platforms such as

Gremlin

or

Chaos Monkey

. Knowledge of resilience testing frameworks and reliability scoring models. Ability to manage multiple initiatives simultaneously in fast-moving environments. Excellent communication, collaboration, analytical, and decision-making skills. Strategic mindset that balances technical innovation with business priorities.

Preferred Qualifications



12+ years of experience in SRE, DevOps, or IT operations roles. Proven track record implementing observability, AIOps, and automation solutions at enterprise scale. Certifications in cloud platforms, observability tools, or SRE-related disciplines.
Job Type: Fixed term contract
Contract length: 12 months

Pay: 80,000.00-85,000.00 per year

Benefits:

Life insurance Sabbatical
Work Location: Hybrid remote in Hove BN3 3YU

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD4237246
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Full Time
  • Job Location
    Hove, ENG, GB, United Kingdom
  • Education
    Not mentioned