Principal Data Engineer

USA TODAY NETWORK via Stack Overflow
Development

McLean, VA

Oct 26th 2018


The API Services team is responsible for engineering and delivering cutting-edge services to aide in content delivery to end customers. These services support 110 news brands, and more than 110 million unique monthly visitors.

The Principal Data Engineer will play a key role in architecting, developing and maintaining the data architecture for Gannett's new Content Platform that supports the content production & delivery systems that are consumed by both our network of 3000 journalists & our customer facing products. You will be expected to design & consume large scale, fault tolerant and highly available architectures. A large part of your role will be forward looking, with an emphasis on optimizing content structures & relationships.If you have a passion for rapid development, automation, learning, challenging and bettering your peers, with a strong desire to operate in a full stack environment, you'd probably fit in well here.

Responsibilities:

  • Collaborate with stakeholders & developers to identify data needs & ideal implementation.
  • Contribute to the architecture and vision of Gannett's content data pipeline.
  • Track record of evolving complex data environments.
  • Continuously evaluate data usage patterns and identify areas of improvement.
  • Interface closely with data scientists and engineering to ensure reliability and scalability of data environment.
  • Drive future state technologies, designs and ideas across the organization.
  • Provide planning for two-week sprints.
  • Provide day to day operational support for our applications.
  • Improve and establish best practice around our application and infrastructure monitoring.

Automate everything:

  • Containerizing applications with Docker
  • Scripting new solutions/APIs/services to reduce toil
  • Research new tools to optimize cost, deployment speed and resource usage
  • Assist in improving our onboarding structure and documentation.

Responsibility Breakdown:

  • 30% - Data architecture design / review
  • 20% - Mentoring
  • 15% - Application Support
  • 15% - Planning / Documentation
  • 10% - Design applications / recommendations / poc
  • 10% - New Technology Evaluation

Technologies:

Systems:

  • Linux
  • Couchbase
  • Elastic Search
  • Solr
  • Neo4j
  • Other NoSQL Databases

Exciting things you get to do:

  • Engineering high-performant applications with an emphasis on concurrency
  • Agile
  • Amazon Web Services, Google Compute Engine
  • Google DataStore, Spanner, DynamoDB
  • Docker, Kubernetes
  • Database testing
  • GraphQL
  • Fastly
  • Terraform
  • Monitoring with NewRelic

Minimum Qualifications:

  • Deep experience in ETL design, schema design and dimensional data modeling.
  • Ability to match business requirements to technical ETL design and data infrastructure needs.
  • Experience using search technologies like Elasticsearch and Solr and designing the integration of search with a persistent data store.
  • Deep understanding of data normalization methodologies.
  • Deep understanding of both Relational and NoSQL databases.
  • Experience with data solutions like Hadoop, Teradata, Oracle.
  • Proven expertise with query languages such as SQL, T-SQL, NRQL, solr querying.
  • Self-Starter that can operate in a remote-friendly environment.
  • Experience with Agile (Scrum) and test driven development, continuous integration and version control (GIT).
  • Experience deploying to Cloud compute or container hosting.
  • Experience working with data modeling tools.
  • Basic understanding of REST APIs, SDKs and CLI toolsets.
  • Understanding of web technologies.
  • Experience with Data in a media industry is a plus.
Apply for this job