Site Reliability Engineer at Tango
Mountain View, CA, US
Tango Site Reliability Engineers are the keystones that hold together the Tango platform and make sure that it is stable, scalable, and secure. They are subject-matter experts for Tango's product as well as everything that keeps Tango working. This includes the platform, systems, and network infrastructure that works 24x7 to make Tango work all over the globe. To make all of this work, SREs must work directly with Engineering to design and support features on globally distributed systems that operate at significant scale.

Summary:

Tango is a leading mobile messaging service with more than 320 million registered members around the world and is transforming the way people communicate, discover and share. Evolved from its beginnings in 2009 as a cross platform texting and calling app, Tango today combines free communication, social networking, and content in a single platform. The best part, Tango is giving members more fun and engaging ways of connecting with those they care about through social networking, playing games, sharing music, news feed channels, and a whole lot more!

Join the Tango team where we work together in a thriving, fast-paced start-up environment that is about passion, trust and drive. In between programming, designing, and serving our members, our team enjoys daily family style meals, and many other great perks to make our employees happy.

Responsibilities:

Be an expert: acquire a deep understanding of the Tango platform and products
Be self-sufficient: diagnose and handle complex problems and solve scaling challenges
Be part of the team: get to resolution ASAP by coordinating with the right people. Build relationships with Engineering and other teams before you need them so that you know who the right people are. Be a mentor to new SREs and actively work to make them productive members of the team
Find bottlenecks: know SQL, know NoSQL. Know how to diagnose performance problems from the application stack down to the DB, OS, kernel, network, and hardware. Find information that shows when to work on application code, tune existing equipment, or get more hardware
Scale the system and the team: write tools and automation that eliminate repetitive tasks and allow for greater leverage
Think ahead: always be looking ahead for scalability challenges. Use or write tools to make sure we always know we have capacity. Support world domination by making everything redundant or fault tolerant
Communicate effectively: write excellent documentation to enable other team members and new hires to easily learn and support the Tango platform
Preferred Skills and Experience:

8+ years proven production experience in a SaaS, Hosted Application or other mission critical systems environment
5+ years administering Linux systems in a mission critical environment. Experience with production environments running newer kernel versions than those of Centos and Red Hat is a plus
5+ years of proven experience writing scripts and automation tools. Python or bash
Excellent network (TCP/IP) skills, CCNA/CCNP a plus
Experience developing automation tools against VMWare ESXi a plus
Strong general technical, analytical, and problem-solving abilities
Grace under fire: accurate and precise during, especially when responding to problems
Excellent written and oral communication skills
Experience with high volume web environments handling millions of users and millions of transactions per day