Join us at Barclays as a Site Reliability Engineer (SRE). We're looking for a SRE to help design, develop, and enhance software that powers critical business, platform, and technology capabilities for our customers and colleagues.
To be successful as a Site Reliability Engineer, you should have experience with:
Platform resiliency & capacity management for clusters and platforms (Kubernetes/OpenShift): SLOs, error budgets, autoscaling, quotas, node pools, capacity planning.
AWS platforms including Lambda and cost optimisation/resource management (EKS, EC2, VPC, IAM, budgets, rightsizing, scaling policies).
Observability & incident response with automation: monitoring, alerting, tracing, on-call, postmortems; Python/Shell for runbooks and auto-remediation.
Some other highly valued skills may include:
Performance & load engineering and capacity modelling.
Chaos/DR testing and reliability patterns: circuit breakers, bulkheads, retries/backoff.
FinOps tooling familiarity: cost explorer/curation, anomaly detection, utilisation dashboards.
You may be assessed on the key critical skills relevant for success in role, such as risk and controls, change and transformation, business acumen strategic thinking and digital and technology, as well as job-specific technical skills.
This role is based in Knutsford.
MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.