Salary: Competitive salary and package (Depending on level of experience)
Locations: London (must be willing to travel to client sites throughout the UK on an adhoc basis).
Accenture are partnering with scaled UK AI compute pioneers to lead the charge on next-generation infrastructure for sovereign AI. To support this endeavour, we're building a high-performance compute operations team in London.
Our work will be sensitive, secure, 24x7 and on the most up-to-date high density compute stacks available. Shift teams will be setup and operate 24x7 and successful candidates working on shift will be paid a shift premium for the non-standard unsociable shift hours that will be part of that rota.
An
y offer of employment is subject to satisfactory BPSS and SC security clearance which requires 5 years continuous UK address history
(typically including no periods of 30 consecutive days or more spent outside of the UK)
at the point of application.
Key Responsibilities:
Managing and maintaining Linux including installation, configuration, and troubleshooting.
Managing and supporting Hypervisors
Deployment and configuration clusters On-Premises and Private Cloud platforms.
Deploying clusters in a containerized environment.
Perform system administration, networking, scripting, and automation to ensure efficient system operations.
Effective line and shift management, people development and leadership for junior team members
Respond to real-time alerts and dashboards for compute, storage, and networking resources to detect service-impacting events.
Perform initial triage and isolation for incidents following established runbooks and procedures.
Document incidents, actions, and outcomes accurately in the incident management system. Support service delivery and SLA compliance
Ensure thorough shift handovers documenting operational status and ongoing incidents.
Investigate and resolve incidents escalated from L1.5 Engineers, conducting in-depth analysis of compute, storage, and network issues.
Develop and refine troubleshooting guides, runbooks, and knowledge base articles.
Coordinate with engineering, automation, and vendor teams for persistent or complex technical problems.
Monitor and analyse performance metrics and incident trends, recommending proactive measures for reliability improvements.
Mentor L1.5 Engineers to support skill development and knowledge sharing.
Participate in shift rotations and on-call schedules as required.
Eligibility for UK Government security clearance
Required Skills:
Linux experience
Technical experience in networking, storage, compute, and related infrastructure.
Strong understanding of network protocols, configurations, and security measures within a Linux environment.
Understanding of storage systems (e.g. S, Cloud)
Ability to write and utilise shell scripts (e.g., Bash) and automation tools like Ansible for efficient system management.
Ability to monitor, diagnose, and optimize system performance for efficiency.
Expertise in scripting languages such as Python, PowerShell & Shell.
Expertise in Kubernetes for container orchestration and managing containerized applications
Experience in incident management, advanced troubleshooting, and operational best practices.
Understanding of Containers & Container
Management (Docker, Kubernetes, EKS)
Scripting for automation (Python, Bash)
Familiarity with ITSM, Agile methodologies and tools eg ServiceNow, Jira
- Bachelors Degree in Electrical Engineering (Relevant experience will be considered)
Beware of fraud agents! do not pay money to get a job
MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.