Site Reliability Engineer
Remotive
Remote
•3 hours ago
•No application
About
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.
Role Description
As a Site Reliability Engineer (SRE) at GROUP1001 you will be responsible for ensuring the reliability, availability, and performance of our systems and applications. You will work closely with our development, operations, and security teams to design, implement, and maintain robust and scalable infrastructure solutions. The ideal candidate is passionate about automation, continuous improvement, and delivering exceptional user experiences.
- Design, implement, and maintain highly available and scalable infrastructure solutions on cloud platforms (e.g., AWS, Azure, GCP).
- Implement and manage DevSecOps practices for multi-Cloud, multi-region project lifecycle, enhancing collaboration and efficiency.
- Experience with monitoring and observability tools (Grafana preferably) for real-time system monitoring and troubleshooting.
- Strong Git skills, comfort in trunk-based workflows with semver release tagging.
- Design and implement Infra CI/CD pipelines for automated geospatial software deployment and infrastructure management.
- Conduct regular system audits to identify and address potential issues before they impact project delivery.
- Ensure compliance with data governance and security policies throughout the geospatial project lifecycle.
- Provide technical guidance and mentorship to junior team members, fostering a culture of learning and growth.
- Work on tasks such as preventing incidents with setting up alerts for symptoms.
- Coordinate with multiple teams such as Data Platforms, NOC/SOC and IT security teams.
- Build effective monitoring systems with proactive and reactive alerts.
- Build system health dashboards.
- Build end user monitoring dashboards.
- Work with Delivery teams to provide insights into monitoring data.
- Manage deployments and incidents.
- Integrate alerts with notifications engine.
Qualifications
- 10-14 years of experience.
- Git, GitLab, Infra CI/CD Pipelines.
- Terraform and/or Pulumi.
- Hands-on experience as SRE.
- Experience with AWS, Azure.
- Experience with APM tooling.
- Experience automating Operational actions with CI/CD pipelines.
- Experience with Operational Excellence, generating runbooks and working handoffs to L1/L2 teams.
Preferred Skills
- Worked as SRE in environments such as AWS, Azure, Angular, REST/GraphQL, Neo4j, Event hubs.
- Proven experience in Service Meshes.
- Proven experience with Backups and Patching.
- Proven experience with Policy-as-Code (Rego, OPA).
- Proven experience with ZTNA Policies.
Compensation
Our compensation reflects the cost of labor across several U.S. geographic markets. The base pay for this position ranges from $180,000/year in our lowest geographic market up to $200,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.
Benefits
- Employees who meet benefit eligibility guidelines and work 30 hours or more weekly, have the ability to enroll in Group 1001’s benefits package.
- Employees (and their families) are eligible to participate in the Company’s comprehensive health, dental, and vision insurance plan options.
- Employees are also eligible for Basic and Supplemental Life Insurance, Short and Long-Term Disability.
- All employees (regardless of hours worked) have immediate access to the Company’s Employee Assistance Program and wellness programs—no enrollment is required.
- Employees may also participate in the Company’s 401K plan, with matching contributions by the Company.
