Senior Data Engineer - Research Integrity team

Springer Nature opens the doors to discovery for researchers, educators, clinicians and other professionals. Every day, around the globe, our imprints, books, journals, platforms and technology solutions reach millions of people. For over 180 years our brands and imprints have been a trusted source of knowledge to these communities and today, more than ever, we see it as our responsibility to ensure that fundamental knowledge can be found, verified, understood and used by our communities – enabling them to improve outcomes, make progress, and benefit the generations that follow.
Visit group.springernature.com and follow @SpringerNature /@SpringerNatureGroup

Job Role- Senior Data Engineer - CA Research Integrity

Location- Kharadi , Pune

Springer Nature is one of the world’s leading global research, educational and professional publishers. It is home to an array of respected and trusted brands and imprints, with more than 170 years of combined history behind them, providing quality content through a range of innovative products and services. Every day, around the globe, our imprints, books, journals and resources reach millions of people, helping researchers and scientists to discover, students to learn and professionals to achieve their goals and ambitions. The company has almost 13,000 staff in over 50 countries.

About Us

We’re looking for a Data Engineer to join Content Acquisition within Springer Nature Technology. Springer Nature is a leading publisher of scientific books, journals and magazines with over 3000 journal titles and one of the world’s largest corpora of peer-reviewed scientific text data. You would be joining the team responsible for evolving a cross- platform view (Data-as-a-Product) of our submission data. This is driving workflow management, customer experience and business reporting in up to 30 teams.

We are committed to growing and nurturing our people for the long-term. We spend 10% of our time working on our own projects to promote learning and innovation; as well as regular lunch n’ learn sessions to share knowledge.

We offer a mixed remote/office working with up to two/three days per week working from home. You'll be part of our bigger community of developers located in India, Portugal, Germany and UK.

You will be joining a cross functional team with different nationalities, backgrounds and experience levels. All team members collaborate to deliver solutions that best satisfy the needs of researchers and other readers.

Roles Responsibilities :

We are looking for an experienced GCP Data Engineer with over 5 years of hands-on experience. The ideal candidate will work with Google Cloud Platform (GCP), BigQuery, Python, DBT, Terraform, and GIT.

Design, develop, and maintain scalable data pipelines on GCP.
Implement and optimize data storage solutions with BigQuery for large-scale processing.
Develop, test, and deploy data transformation workflows using Python.
Collaborate with data scientists, analysts, and stakeholders to meet data requirements.
Ensure data quality and integrity throughout the data lifecycle.
Implement CI/CD pipelines for data applications and manage infrastructure using Terraform.
Utilize DBT for data modelling, testing, and documentation.
Use GIT for version control and code collaboration.
Monitor and troubleshoot data pipelines to ensure reliability and performance.
Stay updated with industry trends and best practices in data engineering and GCP services.

Within 3 Months you will:

Get familiar with our emerging technology stack and data landscape.
Align yourself with the work of the data platform team and understand the data requirements and issues facing our users.
Collaborate effectively with each discipline on the team.
Actively participate in technical discussions and share ideas.
Work with architects and other data engineers in the organization to align the data processing architecture

By 3-6 months you will:

Have an understanding of the team’s context within the wider organization.
Be a supportive member of the team, developing the platform by using the appropriate technology solutions to solve the problem at hand.
Triage support queries and diagnose issues in our live applications.
Identify new sources of data across the organization and build relationships with data providers to gain access.
Understand the processes by which data is acquired and any resulting limitations or bias and communicate this to the team.
Develop and maintain data pipelines to load data into systems like BigQuery, to analyze, clean and join datasets, in an automated, repeatable way.
Ensure that data is stored securely and in compliance with GDPR.
Work with data owners to understand how we can allow them to self-serve their data using tools we develop.

By 6-12 months you will:

Develop processes and tools to monitor feeds and test data integrity and completeness and to alert users when a problem occurs.
Understand our customers’ needs, both internal and external, and how your work affects their experience.
Able to gauge the complexity or scope of a piece of work, breaking it into smaller pieces when appropriate.
Give and receive constructive feedback within your team.
Mentor other members of the team in the principles of data engineering and promote best practice.
Promote and advocate the use of data across Springer Nature.
If you have an interest in data science, there may be opportunities to apply machine learning techniques to these datasets to assist in the work of domain teams.

As part of an Agile product team, day-to-day you will:

Take part in our daily stand-ups.
Contribute to ceremonies like steering, story writing, collaborative design and retrospectives.
Develop new features and improve code quality by pair programming with other team members.
Take part in the support and monitoring of our services.
Interact with various stake holders where required to deliver quality products.

About You

Over 5 years of experience in data/software engineering on a cloud platform (AWS/GCP/Azure) using tools such as DBT and programming languages such as Python, Scala or Java.
You have strong SQL and data problem-solving skills.
Experience with data modelling and transformation tools like DBT.
Possess a solid understanding of modern data engineering practices.
You factor in non-functional aspects of data pipeline development, including quality checks, cost-effectiveness, sensitive data handling, usage monitoring, and observability of data pipelines and data quality.
You promote working in a cross-functional, collaborative team where there is collective code ownership.
You understand how your teams’ work can impact interdependent teams and design accordingly.
You are comfortable with making large-scale refactoring of a codebase.
You can facilitate and guide technical discussions to a workable outcome.
You enjoy mentoring team members and act as a role model on the team.
You understand distributed systems concepts and are familiar with the pros and cons of common data architectures, including data meshes

Good to Have:

Expertise in GCP & BigQuery and large-scale data processing.
Strong Python programming skills.
Experience with data modelling and transformation tools like DBT.
Familiarity with infrastructure-as-code tools like Terraform.
Proficiency with GIT for version control.
Strong problem-solving skills and attention to detail.
Excellent communication and teamwork abilities.
GCP Certification (e.g., Professional Data Engineer).
Experience with other GCP services (e.g., Cloud Storage, Cloud Composer, Dataflow).
Knowledge of data governance and security best practices.

At Springer Nature, we value and celebrate the diversity of our people. We recognize the many benefits of a diverse workforce and strive for an inclusive workplace that empowers all our colleagues to thrive. Our search for the best talent fully encompasses and embraces these values and principles

#LI-DB1

Is a Remote Job?