Job Summary

We're in search of a talented Site Reliability Engineer to join our dynamic tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll play a pivotal role in ensuring the reliability, availability, and performance of our systems. Your responsibilities will include designing, implementing, and maintaining scalable infrastructure solutions to support our expanding customer base. This is an exciting opportunity to thrive in a fast-paced environment and contribute to the success of a company revolutionizing authorization systems worldwide.

Responsibilities

  1. Infrastructure Design and Implementation: Design, implement, and maintain highly available and scalable infrastructure solutions for our projects, products, and clients.
  2. Performance Monitoring and Optimization: Monitor and analyze system performance, identifying and resolving bottlenecks and issues to ensure optimal performance and reliability.
  3. Automation: Automate infrastructure deployment and configuration management processes to streamline operations and enhance efficiency.
  4. Reliability and Security: Continuously improve system reliability, security, and efficiency through proactive monitoring, capacity planning, and performance tuning.
  5. Troubleshooting: Troubleshoot and resolve complex infrastructure and application issues in production and test environments, ensuring minimal downtime and smooth operation.
  6. Collaboration: Collaborate closely with software engineering teams to design and implement resilient, scalable, and secure systems.
  7. On-call Support: Participate in an on-call rotation and respond to production incidents promptly and effectively.
  8. Documentation: Document system configurations, troubleshooting procedures, and operational guidelines to maintain clear and comprehensive documentation.

Requirements

  • Proven experience as a Site Reliability Engineer or in a similar role, demonstrating a strong understanding of infrastructure management and optimization.
  • Solid grasp of networking, operating systems, and cloud infrastructure, with hands-on experience in System Design and Distributed Computing.
  • Proficiency in various programming languages, including NodeJS, Java, Python, Ruby, and Go, with a keen interest in learning and adapting to new technologies.
  • Experience with containerization technologies such as Docker and Kubernetes, along with expertise in infrastructure-as-code tools like Terraform and Pulumi.
  • Familiarity with monitoring and logging tools such as Prometheus, Grafana, and ELK stack, essential for maintaining system health and performance.
  • Knowledge of relational databases, including lower-level implementation details, with bonus points for experience with distributed SQL databases like Google Cloud Spanner or CockroachDB.
  • Proficiency with Git and GitHub for version control and collaboration, coupled with experience in continuous integration and deployment systems.
  • Strong problem-solving and troubleshooting skills, complemented by excellent communication and collaboration abilities.

Benefits

  • Competitive salary ranging from $100,000 to $130,000 annually, depending on experience and qualifications.
  • Comprehensive health insurance coverage, including medical, dental, and vision plans, to support your well-being and peace of mind.
  • Flexible work arrangements, including remote work options and flexible hours, promoting a healthy work-life balance.
  • Opportunities for professional development and career advancement through training programs and mentorship initiatives.
  • Collaborative and inclusive work culture, where your contributions are valued, and your voice is heard.
  • Company-sponsored social events and team-building activities to foster camaraderie and promote a positive work environment.

Full Time

8 to 5

Remote

Apply

Our four step process delivers the results you're looking for.

Our Process