Engineering Team Lead, SRE - Real-time Data
Location
London
Business Area
Engineering and CTO
Ref #
10044820
Description & Requirements
-------------------------------
Bloomberg's Real-time Data group is responsible for distributing low-latency, high-volume financial data to users around the world. From equity prices to FX rates, our infrastructure handles over 60 billion messages per day from 370+ global exchanges, powering 375,000 Terminals and 3,000+ BPIPE clients across on-prem and cloud environments.
The London Real-time Data SRE team plays a critical role in making this possible--developing the core services and tooling that ensure our systems are reliable, scalable, and observable. As a Team Lead, you'll manage and mentor a talented group of SREs and software engineers, while staying hands-on with technology and systems design.
What You'll Own
You'll lead a team that supports several key components of the Real-time Data platform:
Configuration Delivery Services: Enables thousands of servers and BPIPE endpoints to "call home" and receive correct settings.
Peer Discovery Infrastructure: Groups servers into discoverable clusters and provides tools to manage them.
Observability and Monitoring Frameworks: Ensures we have high visibility across a vast estate of global infrastructure.
Data Quality Tooling: UI and backend systems for diagnosing distribution issues across the real-time data network.
Cross-team Reliability Work: You'll help improve the reliability of systems beyond the team's formal ownership.
You'll balance operational excellence with software development, helping your team deliver tools, services, and processes that scale with the business.
How the Team Operates
The team's mission aligns with five SRE pillars:
Latency Monitoring & Management - Define SLIs/SLOs, track latency, and build tools to diagnose issues.
Capacity Management - Maintain disaster readiness and scalability through monitoring and forecasting.
System Observability - Proactively detect issues, build alerting systems, and centralize health dashboards.
Production Risk Management - Ensure safe software releases, drive infrastructure improvements.
Incident Response - Lead or support fast, effective remediation during live incidents; build automation for common operational issues.
What We're Looking For
We're seeking a leader who can combine strong technical execution with people-first leadership. You'll guide the team's roadmap, help individuals grow, and contribute to the broader reliability strategy across Real-time Data.
You'll need to have:
Experience managing or mentoring engineers in a collaborative, inclusive environment
Strong hands-on development skills in an object-oriented language--Python or C++ preferred
A background in building reliable, well-tested software for production systems
Confidence diagnosing and resolving live operational issues
Strong communication skills--able to work across teams and influence peers
A track record of helping teams plan, prioritize, and deliver complex technical project
* The ability to define a long-term vision for the team's technology and culture
Beware of fraud agents! do not pay money to get a job
MNCJobs.co.uk will not be responsible for any payment made to a third-party. All Terms of Use are applicable.