If you are a passionate engineering leader who thrives in a fast-paced, high impact and high growth environment, please read on! We are looking for a DevOps Engineer, SRE to lead a dynamic team of SRE Engineers whose mission is to continuously improve our development operations and support the reliability and availability of all our applications and services deployed to the cloud.
What you get to do:
- Partner with various engineering teams to own and manage availability, latency, performance, reliability and scalability of all services to maintain SLAs that our customers expect from us
- Provide strong technical leadership and people management to the team
- Act as a SME and demonstrate end-to-end ownership of SRE function with automation-first mindset
- Collaborate with Engineering Managers, Product Managers, Software Engineers across the organization to deliver a comprehensive solution for SRE discipline and set effective SLOs/SLIs and error budgets for all services
- Define well-defined processes, methodologies, metrics and KPIs to drive accountability
- Own and drive cloud cost management, capacity planning and performance management and partner with engineering leaders to optimize for performance and cost
- Drive instrumentation of services for end-to-end Observability – Monitoring, Alerting, Metrics, Logging and Dashboard
- Own and drive incident management process, conduct blameless post mortems and publish incident reports, maintain runbooks and on-call schedules
- Coordinate with product development engineering teams on change management and release management processes
- Be an evangelist for SRE best practices
- Champion and implement strong DevSecOps principles and SDLC best practices
- Coach, mentor and cross-train team members
- Servant leadership and Agile DNA
What you bring to the role:
- 10+ years of professional software engineering experience
- 5+ years of building and managing technical engineering teams in SRE, DevOps or other related domains
- Experience supporting infrastructure and services in public cloud environments; AWS required
- Experience leading SRE team with strong emphasis on automation and continuous improvement
- Strong hands-on knowledge of one or more of Infrastructure-as-Code tools and technologies: Terraform, AWS CloudFormation, Packer, etc.
- Strong knowledge of network engineering and foundational network protocols and services such as TCP/IP, HTTP, DHCP, DNS, VPN, etc.
- Experience with Agile software development and Scrum methodology
- Great communication, collaboration and presentation skills
- Strong problem solving and analytical skills
- AWS certification
- Expertise in Linux systems
- Experience building and managing fault tolerant large scale distributed systems
- Strong hands-on experience in one or more of Containers and Container Orchestration frameworks: Docker, Kubernetes, Amazon ECS, Amazon EKS, Amazon Fargate, etc.
- Experience with CI/CD, DevOps and Pipeline-As-Code: ArgoCD, Jenkins, Spinnaker, Gitlab CI/CD, etc.
- Experience with Microservices architecture and CNCF ecosystem
- Bachelorâs or Masterâs Degree in Computer Science, Engineering or related discipline
To apply, please visit the following URL:https://remoteOK.com/jobs/153607→