Site Reliability Engineer (SRE)

Full Time 1 week ago Lagos, Lagos

Employment Information

The Role

  • The Site Reliability Engineer SRE is responsible for ensuring the availability, reliability, scalability, and performance of business-critical applications and infrastructure. The role combines software engineering and operations expertise to automate processes, improve platform stability, and enhance system observability.

What You Will Do

  • Design, implement, and maintain highly available and scalable infrastructure.
  • Monitor production systems and proactively identify performance bottlenecks.
  • Manage incident response, root cause analysis RCA, and problem management activities.
  • Develop automation scripts and tools to improve operational efficiency.
  • Implement and maintain CI/CD pipelines.
  • Manage cloud infrastructure across AWS and hybrid environments.
  • Configure and maintain observability platforms including monitoring, logging, and alerting solutions.
  • Define and track SLIs, SLOs, and error budgets.
  • Support application deployments and release management processes.
  • Collaborate with Engineering, Security, Data, and Product teams to improve system reliability.
  • Perform capacity planning and disaster recovery testing.
  • Ensure infrastructure and systems comply with security and regulatory requirements.

Requirements

What You Bring

Education

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.

Experience

  • 4–7 years of experience in Site Reliability Engineering, DevOps, Cloud Engineering, or Infrastructure Operations.
  • Experience supporting mission-critical financial services or fintech platforms is an advantage.

Technical Skills

  • Strong knowledge of AWS services EC2, ECS/EKS, RDS, Lambda, VPC, IAM, CloudWatch.
  • Experience with Infrastructure as Code Terraform, CloudFormation.
  • Knowledge of containerization technologies Docker, Kubernetes.
  • Experience with CI/CD tools GitHub Actions, GitLab CI/CD, Jenkins, Azure DevOps.
  • Experience with monitoring tools such as Datadog, Prometheus, Grafana, New Relic, or ELK Stack.
  • Strong Linux administration skills.
  • Experience with scripting languages Python, Bash, PowerShell.
  • Understanding of networking, DNS, load balancing, VPNs, and security controls.

Preferred Certifications

  • AWS Certified Solutions Architect.
  • AWS SysOps Administrator.
  • Kubernetes Certifications CKA/CKAD.
  • HashiCorp Terraform Associate.
  • Key Competencies
  • Problem-solving and analytical thinking.
  • Incident management and troubleshooting.
  • Automation mindset.
  • Strong communication and collaboration.
  • Attention to detail.

This Role Is Ideal For You If

  • You enjoy solving complex infrastructure and reliability challenges.
  • You are passionate about automation and reducing operational overhead.
  • You thrive in highly available, customer-facing environments where up-time matters.
  • You enjoy working across Engineering, Security, Data, and Product teams to improve system performance.
  • You are proactive and constantly seek opportunities to improve reliability, scalability, and efficiency.

You May Not Enjoy This Role If

  • You prefer manual processes over automation.
  • You are uncomfortable responding to production incidents and troubleshooting critical issues.
  • You prefer working in isolated environments with limited collaboration.
  • You are not interested in continuous learning and evolving cloud technologies.
Wakanda Jobs - Find All Jobs

New Things Will Always
Update Regularly

Wakanda Jobs - Find All Jobs
Your experience on this site will be improved by allowing cookies Cookie Policy