Location
About the job
Benefits
Things you need to know
Apply and further information
Location
------------
This role is being offered as hybrid working based at one of our core HQs (Birmingham, Leeds, Liverpool, London Canary Wharf) and Scientific Campuses and Labs (Chilton, Colindale, Porton).
About the job
-----------------
###
Job summary
The Digital and Directorate has primary responsibility for scientific computing and research computing services and support. Key functions of the Digital Development and Operations unit are to provide and support such platforms required by the staff of UKHSA and provide technical capabilities to enable public health services, both within the Organisation and between the Organisation and its customers and stakeholders.
We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our High Performance Computing (HPC) & SRE team. The role will be critical in ensuring the stability, scalability and performance of our services, combining software engineering and systems engineering to build, improve and run reliable, scalable production systems.
The role will be responsible to the Principal Specialist Engineer SRE and is part of the HPC/SRE/AI & research computing unit.
The SRE will use engineering principles to remediate infrastructure and operational problems with primary focus on automation and Continuous Integration/Continuous Delivery (CI/CD) ensuring our services run reliably, are scalable and perform optimally in production environments.
The SRE will monitor and manage these aspects while taking responsibility for multiple cloud infrastructure services. Observability of systems will be key to prioritising the operational service improvements and performance improvements to meet and exceed Service Level Objectives (SLOs).
###
Job description
Architect, develop and manage multi-cloud HPC platforms and on-premise infrastructure
Ensure services are highly available, scalable and resilient
Manage performance, capability and capacity planning
Support UKHSA's AI requirements
Ensure services are stable, scalable, performant and automated
Respond to incidents, troubleshooting issues, and restore services promptly
Prioritise operational service improvements to meet/increase SLO, minimise downtime
Ensure effective monitoring/alerting is in place to proactively identify issues using tools and dashboards and reduce times to respond to issues
Leverage automation to streamline tasks, reduce overhead on repeatable operations, reduce manual intervention and improve efficiency
Write maintainable, clear and concise code
Optimise system performance using strong problem-solving skills to identify bottlenecks with an engineering mindset
Ensure system can handle current/future workloads through automation and capacity planning
Improve services through observability and identify ways to improve observability practices
Define SRE principles and influence/educate stakeholders to adopt implemented principles
Provide technical documentation for engineers and training
Work closely with engineering and technology teams to improve operational processes, reduce manual tasks ensuring seamless collaboration/knowledge sharing and reduce risks and adapt to new ways of working
Service Reliability & Performance
Ensure services are stable, scalable, and performant through engineering best practices and system design
Proactively identify and address system bottlenecks using advanced problem-solving and performance tuning techniques.
Conduct capacity planning and implement solutions to ensure systems can support current and future workloads
Incident Response & Troubleshooting
Respond swiftly to production incidents, ensuring minimal downtime and quick restoration of services
Lead root cause analysis and postmortems, implementing lessons learned to prevent recurrence
Monitoring, Alerting & Observability
Design and implement effective monitoring and alerting systems using tools and dashboards
Improve observability of services, ensuring issues are identified and addressed before impacting users
Continuously refine monitoring practices to reduce alert fatigue and improve response times
Automation & Tooling
Develop automation to eliminate manual, repetitive tasks and improve operational efficiency
Write clear, maintainable, and well-tested code to support automation efforts and system tooling
Drive initiatives to reduce operational toil and improve reliability through Infrastructure as Code (IaC)
Service Level Objectives & Operational Improvements
Define, track, and continuously improve SLOs, SLIs, and error budgets.
Identify and prioritize operational improvements that align with business goals and user experience
SRE Best Practices & Advocacy
Define and evangelize SRE principles across the organisation
Collaborate with stakeholders to integrate reliability practices into the development lifecycle
Collaboration & Knowledge Sharing
Work closely with software engineering, DevOps, and infrastructure teams to streamline deployment and operational workflows
Improve cross-functional collaboration and promote a culture of shared responsibility for service reliability
Documentation & Training
Maintain accurate technical documentation, runbooks, and post-incident reports
Provide training and mentorship to engineering teams on best practices and tools
The above is only an outline of the tasks, responsibilities and outcomes required of this role. The role will carry out any other duties as may reasonably be required. The job description and person specification may be reviewed on an ongoing basis in accordance with the changing needs of the organisation.
###
Person specification
Role Criteria and Other Requirements
Essential Criteria
Proven work experience as a Site Reliability Engineer, DevOps Engineer, Operations Engineer or similar role to the aforementioned
Strong coding skills in languages such as Python, PowerShell or Bash.
Deep understanding of Linux/Unix & Windows systems, networking, and distributed systems
Experience with CI/CD pipelines, cloud platforms (e.g. Amazon Web Services, Google Cloud Platform, Azure) and container orchestration (e.g., Kubernetes)
Hands-on experience with observability tools (e.g., Prometheus, Grafana, Datadog) and alerting systems.
Solid understanding of infrastructure automation (e.g., Terraform, Ansible, PowerShell, Helm).
Excellent communication and collaboration skills.
Experience with security best practices
Possesses problem solving skills and the ability to respond to sudden unexpected demands
Desirable Criteria
Experience leading post-incident reviews
Previous involvement in defining and driving adoption of SRE practices across an organization.
Experience delivering training or mentoring junior engineers
Benefits
------------
Alongside your salary of 56,185, UK Health Security Agency contributes 16,276 towards you being a member of the Civil Service Defined Benefit Pension scheme. Find out what benefits a Civil Service Pension provides.Learning and development tailored to your role
An environment with flexible working options
A culture encouraging inclusion and diversity
A Civil Service pension with an employer contribution of 28.97%
We pride ourselves as being an employer of choice, where Everyone Matters promoting equality of opportunity to actively encourage applications from everyone, including groups currently underrepresented in our workforce.
UKHSA ethos is to be an inclusive organisation for all our staff and stakeholders. To create, nurture and sustain an inclusive culture, where differences drive innovative solutions to meet the needs of our workforce and wider communities. We do this through celebrating and protecting differences by removing barriers and promoting equity and equality of opportunity for all.
Please visit our careers site for more information: UKHSA Hub , Civil Service Careers
Things you need to know
---------------------------
###
Selection process details
Application & Sift
This vacancy is using Success Profiles.
You will be required to complete an application form. You will be assessed on the listed 9 Essential Criteria, and this will be in the form of:
An application form ('Employer/ Activity history' section on the application)
A 1000 word Statement of Suitability & Technical statements
This should outline how your skills, experience, and knowledge provide evidence of your suitability for the role.
Healthjobs UK has a word limit of 1500, but your statement of suitability
must be no more than 1000
.
You will receive a joint score for your application form and statement. (The application form is the kind of information you would put into your CV - please be advised you will not be able to upload your CV. Please complete the application form in as much detail as possible).
Longlisting
If a large number of applications are received, we will longlist into 3 piles of:
Meets all essential criteria
Meets some essential criteria
Meets no essential criteria
Only those that meet all essential criteria will progress to shortlisting.
Shortlisting
In the event of a large number of applications we will shortlist against the lead criteria as follows:
Proven work experience as a Site Reliability Engineer, DevOps Engineer, Operations Engineer or similar role to the aforementioned
If you are successful at this stage, you will progress to interview.
Please note feedback will not be provided at this stage.
Interview
You will be invited to a remote interview.
This vacancy is being assessed using Success Profiles. Behaviours and Technical Skills will be tested at interview.
Candidates will be required to take a technical test, presentation and pass the interview process successfully to enable us to set the rate of the MPS awarded.
The Behaviours tested during the interview stage will be:
Changing and Improving - Lead Behaviour
Delivering at Pace
Managing a Quality Service
Working Together
You will also be expected to prepare and present a 5 minute presentation during the interview. This will be based on either:
Designing a highly available and scalable service
OR
Automating a complex operational process
This will be decided and confirmed ahead of the interview.
There will also be a technical test during the interview, where you will be asked technical based questions to test your knowledge. This will be based on:
SRE principles
Troubleshooting/incident management
System design
Automation/coding
Knowledge in Linux & networking
Cloud technologies
Once this job has closed, the job advert will no longer be available. You may want to save a copy for your records.
Eligibility Criteria
Open to all external applicants (anyone) from outside the Civil Service (including by definition internal applicants).
Location
This role is being offered as
hybrid working
based at one of our
core HQs (Birmingham, Leeds, Liverpool, London Canary Wharf) and Scientific Campuses and Labs (Chilton, Colindale, Porton)
.
We offer great flexible working opportunities at UKHSA and operate using a hybrid working model where business needs allow. This provides us with greater flexibility about how and where we work, to get the best from our workforce. As a hybrid worker, you will be expected to spend a minimum of 60% of your contractual working hours (approximately 3 days a week pro rata, (averaged over a month) working at one of UKHSA's scientific campus sites (Colindale, Porton or Chilton).
If based at one of our scientific campuses, you will be required to have a minimum of a Counter Terrorism Check security vetting check as a minimum.
Our core HQ offices are modern and newly refurbished with excellent city centre transport link and benefit from benefit from co-location with other government departments such as the Department for Health and Social Care (DHSC).
Salary Breakdown (Grade 7)
National: 56,185 - 66,581
Outer London: 58,340 - 68,574
Inner London: 60,494 - 70,566
This role attracts a Market Pay Supplement of 5,000 to 10,000
Security Clearance Level Requirement
All successful candidates must meet the basic security requirements before they can be appointed. The level of security needed is:
Basic Personnel Security Standard (BPSS)
DBS Requirement:
Basic DBS
For this role you will also need to meet:
Counter Terrorism Check (CTC)
For meaningful National Security Vetting checks to be carried out individuals need to have lived in the UK for a sufficient period of time. You should normally have been resident in the United Kingdom for the last 3 years as the role requires Counter Terrorism Check (CTC) clearance. In exceptional circumstances UK residency less than the outlined periods may not necessarily bar you from gaining national security vetting
and applicants should contact the Vacancy Holder/Recruiting Manager listed in the advert for further advice.
Please note: If you are successful at interview, and are moving from another government department, NHS, or Local Authority, the relevant starting salary principles for level transfers or promotions will apply. Otherwise, roles are offered at the pay scale minimum for the grade, but in exceptional circumstances there may be flexibility if you are able to demonstrate you are already in receipt of an existing, higher salary. Pay increases are through the relevant annual pay award for the role and terms.
Future location
UKHSA is investing in a new state-of-the-art National Biosecurity Centre in Harlow, Essex, which will eventually bring together teams currently based at Canary Wharf, Colindale and Porton Down. For more details, please see: Huge biosecurity centre investment to boost pandemic protection - GOV.UK. The new facilities will start becoming operational in the mid-2030s, with full completion by 2038. Staff will move in phases as facilities become available. If you're appointed to a role currently based at Canary Wharf, Colindale or Porton Down, please note that we'll continue investing in these sites for the next decade. As we get closer to the transition, we'll provide full information about relocation support available to staff.
Reasonable Adjustments
The Civil Service is committed to making sure that our selection methods are fair to everyone. To help you during the recruitment process, we will consider any reasonable adjustments that could help you. An adjustment is a change to the recruitment process or an adjustment at work. This is separate to the Disability Confident Scheme. If you need an adjustment to be made at any point during the recruitment process you should contact the recruitment team in confidence as soon as possible to discuss your needs.
You can find out more information about reasonable adjustments across the Civil Service here: https://www.civil-service-careers.gov.uk/reasonable-adjustments/
International Police check
If you have spent more than 6 months abroad over the last 3 years you may need an International Police Check. This would not necessarily have to be in a single block, and it could be time accrued over that period.
Artificial Intelligence (AI)
Artificial Intelligence can be a useful tool to support your application, however, all examples and statements provided must be truthful, factually accurate and taken directly from your own experience. Where plagiarism has been identified (presenting the ideas and experiences of others, or generated by artificial intelligence, as your own) applications may be withdrawn and internal candidates may be subject to disciplinary action.
Please see our candidate guidance for more information on appropriate and inappropriate use.
Link below:
Artificial intelligence and recruitment , Civil Service Careers
Internal Fraud Check
If successful for this role as one aspect of pre-employment screening, applicant's personal details - name, national insurance number and date of birth - will be checked against the Cabinet Office Internal Fraud Hub and anyone included on the database will be refused employment unless they can show exceptional circumstances. Currently this is only for External candidates to the Civil Service.
Feedback will only be provided if you attend an interview or assessment.###
Security
Successful candidates must undergo a criminal record check.
Successful candidates must meet the security requirements before they can be appointed. The level of security needed is counter-terrorist check .
See our vetting charter .
People working with government assets must complete baseline personnel security standard (opens in new window) checks.###
Nationality requirements
This job is broadly open to the following groups:
UK nationals
nationals of the Republic of Ireland
nationals of Commonwealth countries who have the right to work in the UK
nationals of the EU, Switzerland, Norway, Iceland or Liechtenstein and family members of those nationalities with settled or pre-settled status under the European Union Settlement Scheme (EUSS)
nationals of the EU, Switzerland, Norway, Iceland or Liechtenstein and family members of those nationalities who have made a valid application for settled or pre-settled status under the European Union Settlement Scheme (EUSS)
individuals with limited leave to remain or indefinite leave to remain who were eligible to apply for EUSS on or before 31 December 2020
Turkish nationals, and certain family members of Turkish nationals, who have accrued the right to work in the Civil Service
Further information on nationality requirements
###
Working for the Civil Service
The Civil Service Code sets out the standards of behaviour expected of civil servants.
We recruit by merit on the basis of fair and open competition, as outlined in the Civil Service Commission's recruitment principles .
The Civil Service embraces diversity and promotes equal opportunities. As such, we run a Disability Confident Scheme (DCS) for candidates with disabilities who meet the minimum selection criteria.###
Diversity and Inclusion
The Civil Service is committed to attract, retain and invest in talent wherever it is found. To learn more please see the Civil Service People Plan and the Civil Service Diversity and Inclusion Strategy .
Apply and further information
---------------------------------
Once this job has closed, the job advert will no longer be available. You may want to save a copy for your records.###
Contact point for applicants
####
Job contact :
Name : Sarah Handy
Email : sarah.handy@ukhsa.gov.uk
####
Recruitment team
Email : recruitment@ukhsa.gov.uk
###
Further information
The law requires that selection for appointment to the Civil Service is on merit on the basis of fair and open competition as outlined in the Civil Service Commission's Recruitment Principles.
If you feel your application has not been treated in accordance with the Recruitment Principles, and you wish to make a complaint, in the first instance, you should contact UKHSA Public Accountability Unit via email: Complaints@ukhsa.gov.uk
If you are not satisfied with the response you receive from the Department, you can contact the Civil Service Commission: Visit the Civil Service Commission website: https://civilservicecommission.independent.gov.uk
https://www.healthjobsuk.com/job/v7612451
Beware of fraud agents! do not pay money to get a job
MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.