Manager, Cloud Platform
Toronto, ON, CA, M5J 2V5 Vancouver, BC, CA Calgary, AB, CA Montréal, QC, CA Ottawa, ON, CA Edmonton, AB, CA Burnaby, BC, CA
Description
Our Team and What We’ll Accomplish Together
We are Canada’s largest healthcare IT provider and we’re transforming healthcare. The TELUS Health Cloud Platform team is passionate about solving complex problems to make life simpler for patients, clinicians, and the teams that serve them. We’re building secure, cloud-native platforms at scale across GCP, AWS, and Azure — and we take pride in doing it right.
We are building toward an agentic-first operating model. AI agents — not humans — handle the routine: provisioning infrastructure, responding to requests, enforcing guardrails, and guiding teams through self-service workflows. Security is built into everything we do, not bolted on at the end. Our engineers focus on building and improving those agents and systems, not executing manual tasks. We’re looking for a leader who gets this shift and knows how to drive it.
As Manager, Cloud Platform, you will lead both our platform engineering and cloud operations functions. Platform self-service is our north star, and agentic workflows are how we get there. Your mandate is to build the systems — agents, golden paths, automation frameworks, and security guardrails — that allow product and engineering teams to interact with the cloud platform entirely through AI-driven interfaces, without ever needing to file a ticket or wait for a human handoff.
Security is a first-class concern for this role. You will own the security posture of the platform layer — ensuring identity, access, and compliance controls are enforced automatically through code and agents, not manual review. This is a dual mandate: build the agentic platform that eliminates operational toil, while ensuring the platform remains secure, compliant, and trusted by the organization.
What You’ll Do
Build the Agentic-First Platform
-
Design and lead the build-out of an agentic platform operating model — where AI agents (Claude, GitHub Copilot, and custom agents) are the primary interface between product teams and cloud infrastructure
-
Replace manual ticketing workflows with agent-driven request handling: developers describe what they need in natural language or via CLI, and agents generate, validate, and apply the required Terraform or configuration changes
-
Build agent workflows that guide product teams through infrastructure onboarding, access requests, environment bootstrapping, and compliance checks — without requiring Cloud Platform team intervention
-
Establish GitHub as the operational backbone: issues, PRs, documentation, and agent interactions all flow through a GitHub-native model
-
Instrument agents with awareness of platform standards, security guardrails, and organizational context — so they enforce policy automatically rather than escalating to humans
-
Define and communicate the agentic roadmap to senior leadership, engineering teams, and product stakeholders
Own Platform Security & Compliance
-
Own the security posture of the cloud platform layer — ensuring identity, access, and network controls are implemented consistently and enforced through automation across GCP, AWS, and Azure
-
Implement and maintain security guardrails at the organization and pipeline levels, ensuring all infrastructure provisioned through the platform meets baseline security and compliance requirements
-
Lead IAM governance: role binding, access provisioning, key rotation, service account hygiene, and Workload Identity Federation — with a goal of automating these controls through agents and policy-as-code
-
Partner with the Security team to ensure platform capabilities align with organizational security standards and support audit requirements (SOC 2, PIPEDA, HIPAA-aligned practices)
-
Build security into the self-service golden paths — so that teams provisioning infrastructure through approved patterns inherit secure defaults automatically
-
Treat security findings as engineering problems: prioritize remediation through code, automation, and agent enforcement rather than manual review cycles
Own the Self-Service Platform & Golden Paths
-
Design opinionated “golden path” frameworks using Terraform, Terragrunt, and GitHub Actions that standardize and secure infrastructure patterns across GCP, AWS, and Azure
-
Build and maintain a centralized module marketplace and IaC library that teams and agents can consume confidently
-
Ensure all self-service capabilities are agent-accessible — designed for both human and programmatic consumption from day one
-
Establish clear support boundaries: teams using the golden path get full support; non-standard configurations are self-supported
Lead Cloud Operations
-
Ensure operational coverage across the multi-cloud estate: GCP, AWS, and Azure
-
Lead incident management with a focus on durable remediation — every significant incident produces agent runbooks, automation, or documentation that prevents recurrence
-
Drive down request volume through agentic self-service, not headcount scaling — treating high ticket volume as an engineering problem to be automated away
-
Coordinate with the SRE and observability teams to ensure platform services meet reliability expectations and incidents are routed and resolved efficiently
Drive Engineering Excellence
- Build and maintain CI/CD pipelines and Infrastructure-as-Code to automate provisioning, configuration management, patching, and compliance enforcement
-
Contribute to the golden image factory initiative — ensuring CIS-hardened, patched base images are available on-demand across all cloud platforms
-
Champion a “security as code” mindset across the team — policy enforcement, compliance checks, and access controls are implemented in pipelines and agents, not spreadsheets
Lead, Coach & Develop Your Team
-
Manage a blended team of platform engineers and cloud operations engineers, with a deliberate focus on growing agent-building, automation, and security engineering skills
-
Hire for engineers who are energized by building AI-driven, security-first systems — not just operating existing ones
-
Foster a learning culture — create space for the team to grow in agentic development, cloud security, certifications, and IaC alongside day-to-day responsibilities
-
Help shape and evolve team ceremonies and ways of working and contributing to how the team structures its delivery cadence, retrospectives, and planning without being the sole driver of execution
Collaborate Across the Organization
-
Partner with Product, Engineering, Security, and Architecture teams to align platform and agentic capabilities with organizational priorities
-
Serve as the internal champion for agentic workflows — helping product and engineering teams understand how to interact with the platform through agents rather than manual processes
-
Report on platform adoption, agent utilization, security posture, and toil-reduction progress to senior leadership
Qualifications
What You’ll Need
Leadership & Mindset
-
5+ years of progressive experience in cloud platform engineering or cloud operations — with at least 2 years in a people management or technical leadership role
-
A genuine belief in agentic-first, security-first workflows and a track record of building automation that replaces manual processes — not just augments them
-
Experience leading teams through transformation: from reactive, ticket-driven operations toward proactive, agent-driven platform delivery
-
Strong communication skills — able to translate platform complexity into clear narratives for executive leadership and business stakeholders
-
Comfortable operating in ambiguity and driving change in an environment that is still evolving
Technical Depth
-
Hands-on experience across at least two of GCP, AWS, and Azure — with a solid grasp of identity, networking, compute, and security controls at scale
-
Deep expertise in Infrastructure-as-Code (Terraform, Terragrunt) and the ability to design secure, reusable, opinionated module libraries
-
Experience building or working with AI agents and agentic workflows — including prompt engineering, tool use, and integrating agents with CI/CD systems and infrastructure APIs
-
Strong understanding of cloud security fundamentals: IAM, RBAC, service accounts, Workload Identity Federation, network security, and secrets management
-
Experience implementing policy-as-code and automated compliance enforcement in multi-cloud environments
-
Proficiency in at least one scripting/programming language (Python, Go, Bash) — you write code, not just YAML
-
Experience building developer-facing self-service platforms, including CLI tools, GitHub Actions workflows, and chat-based interfaces
Operational Excellence
-
Proven track record of reducing operational toil through automation — with concrete examples of what you built and how it measurably reduced burden
-
Experience managing incident response at scale, including post-mortem facilitation and follow-through on action items
-
Familiarity with request and workflow management practices — and an instinct for treating high request volume as an engineering problem to be automated away
-
Understanding of security and compliance requirements in regulated healthcare environments (SOC 2, HIPAA-aligned practices, PIPEDA)
Education & Certifications
- Bachelor’s degree in Computer Science, Engineering, or a related technical field — or equivalent practical experience
-
Cloud Certifications (Required — at least one): AWS Solutions Architect (Associate or Professional), GCP Professional Cloud DevOps Engineer, or Azure Administrator Associate
-
Cloud Certifications (Preferred — additional): GCP Professional Cloud Architect, AWS DevOps Engineer Professional, Azure DevOps Engineer Expert
-
DevOps / Platform: CKA (Certified Kubernetes Administrator) or equivalent practitioner-level credential is a strong asset
Nice-to-haves
-
Experience designing or operating agentic systems in a production engineering context — including LLM tool use, agent orchestration, or AI-driven workflow automation
-
Familiarity with GitHub Copilot, Claude, or similar AI coding/operations tools in an enterprise setting
-
Experience with cloud security posture management (CSPM) tooling and integrating security findings into automated remediation workflows
-
Experience supporting large-scale infrastructure modernization or cloud adoption programs
-
Experience with identity federation and SSO administration across multi-cloud environments
-
Background in regulated healthcare IT — understanding of patient-facing or clinical systems
-
Experience with FinOps principles and cloud cost attribution
-
Familiarity with enterprise collaboration and development tooling as both a user and administrator
Advanced knowledge of English is required because you will most of the time interact in English with internal parties (colleagues, internal partners, stakeholders, etc.); and work with IT tools whose interface is only accessible in English as part of this position's main responsibilities given its national scope.
#LI-REMOTE