Senior/principal Ml Systems Architect (tensorflow + Python)

Bristol, ENG, GB, United Kingdom

Job Description

#

Overview





We are seeking a highly experienced

ML Systems Architect

to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack.


#

Responsibilities




Architect the ML Solver Platform

: + Define modular architecture for data preprocessing, model execution, and post-processing.
+ Establish clear API contracts between Python/TensorFlow and C# services.

Productionize ML Workflows

: + Convert research code into robust, testable, and observable services.
+ Implement CI/CD pipelines, automated testing, and reproducibility standards.

Integration & Interoperability

: + Design REST/gRPC endpoints for cross-language communication.
+ Ensure compatibility with C#/.NET services.

Performance & Scalability

: + Optimize GPU/CPU utilization, batching strategies, and memory management.
+ Plan for multi-model and multi-tenant scenarios.

MLOps & Lifecycle Management

: + Implement model versioning, artifact registries, and deployment workflows.
+ Set up monitoring, logging, and alerting for solver performance.

Security & Compliance

: + Apply best practices for secrets management, dependency scanning, and secure artifact storage.

#

Required Skills & Experience




ML Frameworks

: Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference.

Programming

: Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling.

Architecture

: Proven experience designing scalable ML systems for production.

APIs

: Proficiency in gRPC/Protobuf and REST for cross-language integration.

MLOps

: CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility.

Performance Optimization

: GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling.

Observability

: Metrics, tracing, structured logging, dashboards.

Security

: SBOM, image signing, role-based access, vulnerability scanning.
#

Preferred Qualifications




Experience with ONNX Runtime Training, PyTorch, or hybrid ML architectures. Familiarity with distributed training strategies and multi-GPU setups. Knowledge of feature stores and data validation frameworks. Exposure to regulated environments and compliance frameworks.
#

Tools & Technologies




ML

: TensorFlow, ONNX Runtime, tf2onnx.

APIs

: FastAPI, gRPC.

DevOps

: GitLab CI/GitHub Actions, Docker, Kubernetes.

Monitoring

: Prometheus, Grafana, OpenTelemetry.

Security

: HashiCorp Vault, Sigstore.
#

Why Join Us?




Work on cutting-edge ML solutions integrated into commercial engineering software. Define architecture that scales across global deployments. Collaborate with a team of experts in ML, software engineering, and UI development.

To apply:

Send your resume and a brief cover letter to HR@softinway.com

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD4392611
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Full Time
  • Job Location
    Bristol, ENG, GB, United Kingdom
  • Education
    Not mentioned