Senior Backend Engineer for Cloud Services
Scrapinghub via Stack Overflow
Jun 1st 2018
About the job:
We are looking for two Senior Backend Engineers to develop and grow our crawling and extraction services. Our automated service is used directly by our customers via API, as well as by us for internal projects. Our extraction capabilities include automated product and article extraction from single pages or whole domains using machine learning and custom built components and we plan to expand it for jobs and news. The service is still in early stages of development, serving its first customers.
As a professional services company we are often required to build a custom crawling and extraction pipeline for a specific customer. That requires crawl and extraction planning with respect to customer needs, including crawling time estimation and HW allocation. The volume is often very high, and solutions have to be properly designed to provide the required performance, reliability and maintainability.
Our platform has several components communicating via Apache Kafka and using HBase as a permanent storage. Most components are written in Python, while several crucial components are made using Scala and Kafka Streams. Currently, main priorities are improving reliability and scalability of the system, integration with other Scrapinghub services, implementation of auto-scaling and other features. This is going to be a challenging journey for every good Backend Engineer!
- Design and implementation of a large scale web crawling and extraction service.
- Solution architecture for large scale crawling and data extraction: design, hardware and development effort estimations, writing proposal drafts, explaining and motivating the solution for customers,
- Implementation and troubleshooting of Apache Kafka applications: workers, HW estimation, performance tuning, debugging,
- Interaction with data science engineers and customers
- Write code carefully for critical and production environments along with good communication and learning skills.
- Experience building at least one large scale data processing system or high load service. Understanding what CPU/memory effort the particular code requires,
- Good knowledge of Python
- experience with any distributed messaging system (Rabbitmq, Kafka, ZeroMQ, etc),
- Docker containers basics,
- Linux knowledge.
- Good communication skills in English,
- Understand a ways to solve problem, and ability to wisely choose between: quick hotfix, long-term solution, or design change.
Bonus points for:
- Kafka Streams and microservices based on Apache Kafka, understanding Kafka message delivery semantics and how to achieve them on practice,
- HBase: data model, selecting the access patterns, maintenance processes,
- Understanding how web works: research on link structure, major components on link graphs,
- Algorithms and data structures background,
- Experience with web data processing tasks: web crawling, finding similar items, mining data streams, link analysis, etc.
- Experience with Microservices,
- Experience with JVM,
- Open source activity.