Trabajo de Site Reliability Engineer en Softtek, Aguascalientes - México

Site Reliability Engineer

As a Site Reliability Engineering you will be building and operating reliable complex software systems at an incredible scale.  SREs follow holistic approach by treating operations as if it’s a software problem. Problem resolution emphasizes automation, improving system design, and building resilience into our systems so that we don’t have to repeatedly fix the same problems.

Write software to solve problems based on the operations mission of keeping the service running”

Softtek is a global organization with multiple portfolios, with a wide variety of projects. Each project is unique and may not require the same stack which opens the opportunity for innovation. This role will challenge your abilities and skills, all while also enabling you to in a skillset that is rapidly increasing demand in the IT Industry. 


Industry & Project: 

This position gives you the chance to start your Softtekian career working in one of our projects in the airline industry for USA. 

  The Airline industry is catching up in the race to create highly reliable cloud environments, posting a unique environment of technical challenge, learning and innovation. 

IT professionals working in the airline industry acquire a higher value in the marketplace due to the extensive expertise required to deliver reliable and agile services in the industry. 


·          Problem Solving- Troubleshooting, identify possible causes, triage and automated remediation.

·          Design and implement Systems monitoring & alerting solutions.

·          Find new failure modes, usually due to newly released systems or features.

  • Environments Maintenance - configuration tools and event-driven automation 

  • Deployments and Runbook Automation.

·          Data replication & resiliency. 

·          Capacity Planning and Performance Analysis

  • Implement solutions to measure availability, reliability, performance, analytics and security 

·          Identify engineering opportunities to enhance the overall operation  

·          Design & maintain reliable & efficient cloud components and services.  


Required Experience:  

  • Operations (L2 Support) experience in large-scale, distributed systems running 24/7/365
  • Strong shell scripting and automation skills ( Bash, Python, PowerShell, etc.)

·          Utilization of  ITIL Methodologies

·          Monitoring Solutions experience - Design, Build and maintenance ( Grafana, Graphite, Prometheus, etc. )

·          Databases 

·          CI/CD Tools

·          Coding experience (Java mainly)

  • Experience participating in Agile Development Methodologies

·          Code Quality, Unit Testing & Security Best Practices

·          Containers (Nice to have)

·          Log Data Mining &Data Analytics (Nice to have)

·          Cloud Expertise   (Nice to have)


Job Requirements:  

·          Fluent English Level.  

·          Ability to work a flexible schedule, varying hours may include mornings, evenings, weekends and extended hours as part of the operations duties.  


Special benefits applicable for this position:   

·          We offer constant Training & support for Certifications . Including SRE Career Path with more than 30 trainings focus on SRE tool stack with highest demand on IT Industry .