Session: Supercharge your Dynamic Data Transformations at Scale with Apache Spark
This talk outlines our development of a scalable data transformation pipeline suitable for both batch and streaming scenarios. We'll discuss the challenges and solutions encountered while scaling to handle hundreds of terabytes daily, shifting from row-level transformations in Apache Spark to using Catalyst Expressions for complex, nested column transformations and DSL-based custom functions.
Bio
Engineering leader at Adobe with extensive experience in leading and architecting scalable, web applications and big data systems. In my 16+ years of experience, I have successfully delivered multiple projects from concept to customers. I am passionate about solving complex problems, designing data models and providing robust and scalable software solutions with customer focus. I am deeply committed to mentoring and actively participate in women in tech forums at various universities. Empowering and uplifting women in the tech industry is a cause close to my heart.