This team currently consists of passionate engineers who strive to demonstrate excellence in the field of DevOps. We are not only responsible for the uptime of the various .COM websites and backend services, but a large portion of the job is to innovate. Historically, once the infrastructure is configured and working – it is mandated that there are no more changes to the system. Not here. We frequently and deliberately rebuild our entire system in an automated fashion. These activities help us not only discover pain points in our system, but it gives us the opportunity to improve continually. How do we guarantee geographic redundancy? How do we get the code to production faster? 15 minutes to rebuild a system, how can we get it down to 5? Instead of Mongo, should we use Cassandra or DynamoDB? Do we even need a database for this application? How can we orchestrate a cloud failover from one provider to another?
Who are we looking for?
The Sr SRE Engineer is a part of an innovative team, who are on a continuous mission of building bulletproof, scalable, secure private and public cloud environments for our customers and users.
If you think hard is fun, and get bored easily if you aren’t challenged, this might be the place for you. We want someone who has an insatiable thirst for technology, desire to learn and grow – individually, with the team, and the business. Someone who has a passion to lead, architect, design, document and implement comprehensive platform solutions using security best practices. This is an extremely challenging position but would be the perfect fit for someone who wants to contribute and grow.
•The SR SRE Engineer is responsible for any and all tasks related to the performance, stability, reliability, efficiency, and security to both the sites and the general team operations. Responsibility also extends to how incidents are managed and operated.
•Proactive relationship building and communication essential in this role. This includes engagements with SRO’s, Clients, and 3rd-Parties to ensure continuous improvements in system architecture, deployments, automation, and configuration management.
•Establish the service delivery culture for our business, building best-in-class service engineering capabilities in the SRE team.
•Work across the engineering team to influence software development to meet the cloud needs and influence product and cloud engineering to improve the manageability and the supportability of the cloud products.
•Design and develop complete end to end automation environment using configuration/auto-scaling tools.
•Define standards for configuration, monitoring, reliability, scalability, performance optimization and capacity planning of new infrastructure focused on 99.9%+ uptime.
•Respond to off-hours and weekend emergency alerts, alarms, and requests, in keeping with the team's on-call rotation schedule.
•Work closely with Architects, Security Engineers, Product Managers, SRO and other clients and partners of the SRE team to meet the needs of the organization to stay competitive - from the infrastructure up to the highest level of applications.
•Strategize with the teams to develop new technology initiatives with a primary focus on availability, supportability, scalability, security, and performance.
•Configure and tune an enterprise monitoring and instrumentation system(s) to efficiently detect existing issues and predict future issues based on trends.
•Stay up-to-date with technology. Recurrently advance your technical skill-sets.
•Continuously improve via taking justifiable risks, not being afraid to fail.
•Be flexible and at the same time push back respectfully to ensure we are doing what is best for the company in the long run.
•Hold vendors accountable and set the bar high, ensure they deliver above expectations.
•Challenge the status quo by recommending / pushing for changes that improve reliability and velocity.
•5+ years of hands-on experience as an individual contributor in a systems administration/development or DevOps role working on highly scalable distributed systems.
•Experience supporting mission-critical platforms, both physical and virtualized environments, using CentOS, RedHat, Ubuntu.
•Strong experience with configuration management systems such as Ansible (preferred), Puppet or Chef.
•Solid understanding of end-to-end technology stacks which include but is not limited to OS, Network, Application, Relational & Nonrelation Databases, interacting with APIs and Security (network & application).
•Experience designing, building and managing large scale infrastructure in AWS and Rackspace, including experience leveraging one or more coding languages for automation.
•Proficiency in high level languages such as Python (preferred), Ruby or Java and working on software projects in a collaborative environment such as Bitbucket or Git.
•Strong knowledge and experience in automation.
•Proven experience leading positive change, cultivating product technology visions and innovative solutions, and fostering effective engineering practices and culture.
•Experience in driving process improvements, with a strong focus on leveraging technology for the establishment of fluid interactions and interfaces between teams.
•Ability to communicate and transfer knowledge clearly and effectively in both technical and non-technical manners.
•Strong ability to prioritize and multi-task in a fast-paced environment.