Site Reliability Engineer Job in Technopride

Site Reliability Engineer

Hove, ENG, GB, United Kingdom

Apply Now

Job Description

We are seeking a highly skilled

Site Reliability Engineer (SRE)

to lead modernization initiatives across IT operations by establishing robust observability practices and automating manual processes (toil). The ideal candidate will combine strategic thinking with deep hands-on expertise to drive reliability, scalability, and efficiency across complex technology landscapes. This role requires strong leadership, advanced technical proficiency, and the ability to foster a culture of reliability and continuous improvement.

Primary Responsibilities

Operational Modernization & Strategy

Collaborate with product engineering teams to define and implement strategies that modernize IT operations, enhance observability, and reduce toil. Architect, deploy, and optimize observability platforms to monitor system health, performance, and reliability. Define and drive strategies for AI-driven alerting, proactive anomaly detection, and event correlation to reduce MTTD and MTTR. Develop and implement SRE practices including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budget policies. Create and maintain an AIOps roadmap to improve operational efficiency and accelerate automation initiatives.

Automation & Reliability Engineering

Automate repetitive processes using scripting, orchestration tools, and AI/ML-driven automation models. Drive initiatives for automated incident response, self-healing workflows, and autonomous operations. Enable shift-left engineering practices by partnering with engineering, architecture, and product teams to improve system reliability early in the development lifecycle. Lead continuous improvement initiatives focusing on reducing operational burden and improving resilience across systems and services.

Incident Management & Root Cause Analysis

Oversee and enhance incident management processes through automation and structured problem-solving. Conduct root cause analyses and drive remediation efforts to prevent recurrence and strengthen system reliability.

Collaboration & Leadership

Work cross-functionally to ensure systems are built to be scalable, resilient, and maintainable. Mentor teams in adopting SRE principles, tools, and modern operational practices. Champion a culture of automation, observability, and reliability across the organization.

Key Skills & Technical Expertise

Core Competencies

Strong proficiency in applying SRE principles across large-scale environments. Advanced hands-on experience with observability tools, specifically

Dynatrace

and

Datadog

. Expertise in automation and scripting using

Python

and

Ansible

. Robust experience with cloud platforms including

AWS

and

Azure

. Deep understanding of containerization and orchestration using

Docker

and

Kubernetes

. Strong knowledge of cloud-native architectures and distributed systems. Exposure to AI/ML-driven predictive analytics, anomaly detection, and automated remediation. Familiarity with CI/CD pipelines and automated release and deployment practices.

Desirable Skills

Experience with chaos engineering platforms such as

Gremlin

Chaos Monkey

. Knowledge of resilience testing frameworks and reliability scoring models. Ability to manage multiple initiatives simultaneously in fast-moving environments. Excellent communication, collaboration, analytical, and decision-making skills. Strategic mindset that balances technical innovation with business priorities.

Preferred Qualifications

12+ years of experience in SRE, DevOps, or IT operations roles. Proven track record implementing observability, AIOps, and automation solutions at enterprise scale. Certifications in cloud platforms, observability tools, or SRE-related disciplines.
Job Type: Fixed term contract
Contract length: 12 months

Pay: 80,000.00-85,000.00 per year

Benefits:

Life insurance Sabbatical
Work Location: Hybrid remote in Hove BN3 3YU

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Related Jobs

Senior Site Reliability Engineer

Lloyds Banking Group

Bristol, ENG, GB

Apply Now
Senior Site Reliability Engineer

Civica

Remote, GB

Apply Now

Site Reliability Engineer

Speechmatics

Cambridge, ENG, GB

Apply Now
Site Reliability Engineer

Speechmatics

London, ENG, GB

Apply Now

Job Detail

Job Id

JD4237246
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Full Time
Job Location

Hove, ENG, GB, United Kingdom
Education

Not mentioned

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers