Site Reliability Engineer (Product & Platform)
Kochava via Stack Overflow
Nov 15th 2018
Kochava builds real-time tracking and attribution analytics tools for connected devices; serving the world's top brands and apps. We analyze millions of requests every single day and are ramping up at an extraordinary pace to serve billions of requests every day. The company is growing fast as we add new clients and services and we are looking to add talented, dedicated and innovative people who will strengthen our core team. Kochava is looking for enthusiastic engineers to join our Platform Site Reliability Engineering Team. As a member of this team you will focus on software development and infrastructure design building services to manage, scale and monitor our shared core infrastructure. The infrastructure and services that this team is responsible includes databases, message queues, monitoring solutions, security and networking in the cloud and physical data-centers. Engineers on this team will be challenged in a fast-paced environment and steer the advancement of efficient, resilient and scalable shared resources used by many of our production core services.
- Streamline and enhance the day-to-day operational workflows of shared services in a 24x7x365 environment located in Google Compute Platform, AWS, and physical data centers.
- Build tools to enhance performance, scalability and observability of resources shared between multiple projects in production.
- Utilize a wide variety of open source technologies to create fault-tolerant, scalable and secure high performance services and pipelines on a global scale.
- Interact with other teams across the organization to define KPIs and evangelize the adoption of best practices in relation to performance and reliability.
- Continuously improve observability to ensure the uptime and reliability of our applications and infrastructure.
- Troubleshoot issues across the entire stack; hardware, software, application and network within physical datacenter and cloud-based environments.
- Provide on-call support for shared services and infrastructure.
- Proven track record of designing, building, optimizing, and maintaining infrastructure on a large scale.
- Proficiency in high level languages such as Go and Python.
- A deep understanding of the Linux operating system, from the console to the kernel.
- Ability to work in as part of a distributed team.
- Experience with containers and container orchestration tools (Docker, Kubernetes and Spinnaker experience preferred).
- Experience working in the Google Cloud Platform environment.
- Software development experience using Go and Python.
- Experience with Kafka, MySQL, Influxdb, Elasticsearch, Redis, and/or Memcached.