Devops Engineer / 3 Month Project

Remote, GB, United Kingdom

Job Description

Background



Nexsys analytics is a consultancy and software company, specialising in water resources management, infrastructure investment, decision making under uncertainty and trade-off analysis. We are currently investing in the development of a series of large-scale optimisation projects with the goal of identifying the most efficient ways to invest in water infrastructure to safeguard future generations.

Description of the project



We are seeking a UK-based DevOps engineer, preferably located in the North of England, with expertise in cluster job scheduling (Slurm or similar), Docker, and big data storage to deliver a 3-month project. The goal is to build a system that allows optimisation jobs to be submitted via our existing API, deployed automatically on a compute cluster with the requested number of cores (if available), and have the results stored in a big data database for later retrieval and analysis.

The project objectives



We have an existing task / job management system which deploys jobs via docker containers on our internal cluster. We seek to expand and augment this API to be more scalable and robust using a task management system such as Slurm.

The responsibilities of the role will include:

- Cluster Setup & Orchestration - Deploy and configure a task manager (Slurm preferred) for scheduling containerised jobs. Implement fair resource allocation (CPU cores, memory, job queues).

- Containerised Execution - Develop Docker images for running optimisation workflows. Ensure reproducibility and portability of the runtime environment.

- Data Storage & Management - Design schema for storing large optimisation outputs in a scalable database (e.g. PostgreSQL + TimescaleDB, MongoDB, or HDFS/S3-backed). Implement pipelines for ingesting results directly from job execution.

- API Augmentation - Extend existing REST API with endpoints to: submit a job request (with parameters incl. number of cores); retrieve job status (queued, running, finished, failed); access job outputs from the database.

- Monitoring & Observability - Integrate monitoring and logging (Grafana/Prometheus/ELK). Provide dashboards for cluster utilisation and job performance.

- Documentation & Handover - Deliver Infrastructure-as-Code templates (Terraform/Ansible preferred). Provide technical documentation and a runbook for long-term operation.

The required skills are:

- Strong experience with Slurm (or equivalent job schedulers)

- Deep knowledge of Docker (Kubernetes experience a plus)

- Big data database design (PostgreSQL, MongoDB, or Hadoop/S3)

- Proficiency in Python/Flask/FastAPI (for API integration)

- Infrastructure-as-Code (Terraform, Ansible, or similar)

- Experience with observability stacks (Prometheus, Grafana, ELK)

Project deliverables:

- Working Slurm (or equivalent) cluster integrated with Docker

- Database backend for storing optimisation outputs

- Extended API with job submission, monitoring, and results retrieval

- Monitoring and logging dashboard

- Documentation and handover materials

Project timeline:

- Month 1: Requirements gathering, Slurm/Docker cluster deployment, database schema design

- Month 2: Integrate job submission workflows, connect jobs to database ingestion, extend API endpoints

- Month 3: Finalise API integration, implement monitoring/logging, test with large-scale runs, deliver documentation and handover

Starting date October for completion by mid-January.

Job Type: Freelance

Pay: Up to 400.00 per day

Benefits:

Flexitime Work from home
Work Location: Remote

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD3741674
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Remote, GB, United Kingdom
  • Education
    Not mentioned