Sr. Platform Reliability Engineer
- DevOps & Cloud, DevOps
- San Francisco, California
- Permanent
A technology company in the advanced computing space is seeking a Sr. Platform Reliability Engineer to help build and support resilient, scalable infrastructure. This role focuses on Kubernetes, IaC, CI/CD, observability, and operational reliability across both cloud and on-prem environments. You will collaborate closely with platform and delivery teams while participating in a sustainable on-call rotation.
*Unfortunately, this role does not offer sponsorship. U.S. Citizens or Greencard Holders Only.*
Details:
- Full-Time, Permanent Position
- Salary: $180k - $200k
- San Francisco, CA | 5 Days On-Site
Key Responsibilities:
- Design and maintain infrastructure across containers, VMs, and hybrid environments in major cloud platforms.
- Build and enforce Terraform-based IaC and consistent Git workflows.
- Own CI/CD pipelines and container build processes with secure, efficient delivery standards.
- Manage container registries, image hygiene, scanning, and promotion workflows.
- Implement GitOps patterns for reliable, declarative environment management.
- Maintain observability systems (metrics, logs, dashboards, alert routing).
- Strengthen security across secrets, RBAC, network policies, and compliance checks.
- Oversee certificate lifecycle management and encrypted communication standards.
- Support disaster recovery plans, backup strategies, and resilience improvements.
Qualifications:
- Bachelor's Degree in Computer Science
- 5 years in SRE/DevOps/infrastructure engineering.
- 5 years of experience with Terraform
- 5 years of experience with containerization and orchestration
- Strong Linux and networking fundamentals.
- 5 years of experience with observability tools (Prometheus, Grafana, Loki, etc)
- 5 years of experience with Python, Bash, or GoLang
- Comfortable with on-call rotations, incident response, and automation
Good to Haves:
- GitOps, policy enforcement tools, secrets management, and certificate automation.
- Registry management and container security scanning.
- Distributed tracing and long-term metrics storage.
- Hybrid/on-prem operations or data-heavy platform support.
Oscar Associates Limited (US) is acting as an Employment Agency in relation to this vacancy.