Site Reliability Engineer

Birmingham, ENG, GB, United Kingdom

Job Description

Role Overview:



We are looking for a

Lead Technical Subject Matter Expert (SME)

with a strong systems-thinking mindset and expertise in

Site Reliability Engineering (SRE)

principles. The primary focus will be to uplift capacity planning and observability controls across a complex technology estate. This role combines deep technical engineering skills with architectural vision and aims to enhance

performance, resilience, and operational control

.

The ideal candidate will possess a solid blend of hands-on expertise and strategic leadership to align technology capabilities with internal control frameworks and regulatory expectations.

Key Responsibilities:



Lead the

design and technical assessment

of capacity management, utilization monitoring, and observability controls. Apply

SRE best practices

to identify control gaps, performance risks, and automation opportunities. Evaluate existing tooling, data flows, and operations to propose and implement control remediations. Collaborate with engineering, infrastructure, architecture, and risk teams to validate technical solutions. Define

reusable technical patterns and tooling strategies

for enhanced operational readiness. Contribute to

roadmap planning

, tooling evaluations, and documentation for governance and operational preparedness.

Required Skills & Experience:



10+ years in engineering, infrastructure, or architecture roles in complex technology environments. Strong understanding of

compute, storage, and network capacity planning

across hybrid/cloud platforms. Hands-on experience with

SRE principles

, including observability, SLIs/SLOs, and task automation. Skilled in interpreting

control requirements

and embedding them into technical designs. Experience with

performance monitoring and diagnostic tools

(e.g., Geneos, Prometheus, Grafana, AppDynamics). Excellent communication skills with the ability to

influence senior stakeholders and risk/control teams

.

Desirable:



Experience uplifting

operational controls

(capacity, availability, performance). Familiarity with

internal risk frameworks

or

regulatory standards

(e.g., DORA, EBA, PRA). Background in

incident response, system diagnostics

, or

service reliability engineering

.
Job Types: Full-time, Permanent, Temporary
Contract length: 12 months

Pay: 49,189.85-90,000.55 per year

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD3042978
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Contract
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Birmingham, ENG, GB, United Kingdom
  • Education
    Not mentioned