DataOps is trending, but what is it?

Automatic Summary

Understanding Data Ops and Opportunities in Tech

Welcome to a discussion on the intrigue of data operations, otherwise known as Data Ops. We are taking a tour through this excitingly underrepresented area in tech, laying magnifying glasses on everything from what it entails to its opportunities.

About…

First, let’s take a quick glimpse into my life. I began my journey into the tech world at a young age, dabbling in coding and eventually earning degrees in computer science and digital intelligence analytics. Today, I am lucky enough to work as a technical manager within the Business Intelligence and Analytics spectrum at Watch Finder and Co, part of the luxury Richmond Group. As a woman, I recognise the underrepresentation in the tech industry and am passionate about advocating for more inclusion.

Why is Data Ops Trending?

Data Ops is not a new concept. It is, however, one of the fastest-growing areas of tech. It encompasses an array of skills, and it offers a world of opportunities for data engineers, data analysts, and data scientists. It's important for us to encourage more women into this sector.

Gender Gap and Salary

The gender gap in tech positions is, unfortunately, still significant. And while there has been some progress in female representation in roles like data scientists, there is always room for improvement. This is coupled with the promising salaries in this area, which further bolster the attractiveness of this field.

Unpacking Data Ops

Data Ops, an abbreviation for data operations, is not to be confused with DevOps. DevOps focusses primarily on the software engineering spectrum whereas Data Ops incorporates principles of DevOps alongside agile, lean, and quality aspects. It aims to bridge the gap between data collection teams and the subsequent analysis and application of these findings.

The Conception and Evolution of Data Ops

Despite being a trending topic, Data Ops has been around for some time. A blog post in 2014 on IBM’s hub spoke about why it was essential for big data success. Fast forward to today, and data continues to be the corporate world's "black gold", a vital catalyst for success and growth.

Understanding the Best Practices of Data Ops

Data Ops encourages collaborative environments and gives emphasis on guidelines, metrics, and the development of defined roles. It promotes automation and a quick delivery level, all while ensuring efficiency and quality.

The Various Roles within Data Ops

Data Ops comprises of four key roles each with its unique set of responsibilities. These include:

  • DevOps engineer: Handles infrastructure, familiarity with cloud environments, facilitates automation.
  • Data engineer: Designs data structures, manipulates SQL, and understands cloud infrastructure.
  • Data scientist: Builds machine learning models, masters advanced mathematics, and codes in languages like Python.
  • Data analyst: Handles data visualization, develops dashboards and reports, maintains advanced SQL skills, and understands statistics and machine learning basics.

It's important to note that one can easily transition from one role to another, facilitating growth and development within the field.

The Data Ops Infrastructure

A Data Ops ecosystem might comprise data sources like databases and APIs, ingestion tools like AWS or Google, analysis tools, data presentation tools, repositories like GitHub for version management, and also data governance. The kind of infrastructure you choose to set up, and the tools therein, are determined by your use case.

The Future of Data Ops

Data Ops continues to grow in secularity in 2021, and it is projected to continue its stride well beyond. Companies are investing heavily in their data infrastructure, recognizing the value data holds for their success and growth.

For those who want to venture into this field, brushing up on SQL, learning a coding language like Python, using open source tools such as DBT analytics, understanding databases and modelling, and brushing up on statistics are just a few steps you could take.

So, there you have it - a crash course on Data Ops, everything it entails and the opportunities it offers. Thank you for joining this session, feel free to reach out if you have more questions about Data Ops and how to venture into this lucrative yet underrepresented field.


Video Transcription

OK. Hello, welcome. And um thank you for um attending the session today. Hopefully you'll get some insights. Um Before I get into what data ops is, I'm actually gonna just give you some context to my background. So, um right. So who am I?Um So I started coding when I was 10. Um I still kind of code, but I don't really uh I'm not a hands on coder in my day to day job. So my, I went, I followed the traditional path and I always say to individuals um everyone's journey into tech is, is uh is their own individual um journey. But for me, um I did a computer science degree that was um followed by a number of years of working, also got a masters with B I Digital Intelligence and Analytics. Um I moved from, with being a developer um as in just coding with different programming languages, moved to B I um now work as a technical uh manager within the B I Analytics um Spectrum. Um That's really been my passion. I currently, I work at Watch Finder and Co that is part of the Richmond Group. Um I'm sure many of, you have never heard of them. But if I say brands like uh Cartier, um Chloe and so forth. So it's a luxury brand.

So we have about 50 Maisons and I work for Watchin. I head up the data team there in my spare time. I I am very much about um inclusion and um I was a founding director of a credit union here in um London and I also volunteer for Need to be.org, which is a organization um that is for advocacy for women in, in tech and that could be from starting at entry level. Um Even before that schools right into um as you get into your working spectrum and then final point about me, um I am from London, I was born here. Um but my parents uh are from the Caribbean and hence I love Caribbean carnival. So that's just a, a little snippet about me, right? So um it daters is trending. Um Daters is not really a new concept, it's just a new spin on how you work with data. I'm not gonna, the one thing I will say is I'm not gonna go into too much detail with the slides. I'm not gonna paraphrase what's on there, but just to give you some insight is that um you know, there's been a huge demand and it's one of the, the fastest growing, um I would say areas in tech today.

Um It encompasses a variety of different um kind of skill sets and, and so forth. But um uh for me, I think it's one of the areas that is highly underrepresented by women and that's one of the reasons why um I felt it was the need to talk about it and so that people can understand there's different aspects that you can get into it. Um You see a lot of roles for um data engineers. Uh you may see roles for data analysts, data scientists. Um and they've all got different skills and you can, it's easy to move transition from being a data and well, not easy, but you can um easily get into being a data analyst and then move to, you know, data engineer or data scientist and eventually as well get into the management side because um the number of women that are actually managed in this area is not, is not high enough for, for, for me.

Um So that's why I talk about it. So before I get into it, um I just wanted to kind of highlight, you know, the gender gap in terms of how many women make up these positions. And um even though, you know, here you've got 25% of those scientists um are women. Um But, you know, it's still not good enough. So it, it's definitely an area that needs to have more women um coming into this, into this area and the spectrum. Um I just wanted to also highlight the salary. So you know what it actually looks like. So juniors can actually start off at 40 K and then seniors for their analysts move up to 70. Um ok, that's, this is in, in pounds sterling as you can see the demand for year on year uh rise has increased over time. Um Again, you know, we don't have enough individuals that are uh budget um in, you know, holders and again, you can see the salary, I'm not gonna go into too much detail with that. So because of time and then again, with the data scientists, and this is like one of the highest paid areas that you tend to find because of the specialisms and uh the expectation that you should be able to um do machine learning, you know, um A I and, and stuff like that.

So I'm gonna kick into now, what is data ops, right? So the it's abbreviation for data operations. Um it's not to be confused with DEV OPS. I know that people um say that it's the same, but DEV ops is very much around development um in, you know, in, in that spectrum software engineering, whereas data ops takes some of the principles of DEV ops, but it also includes agile lean and quality aspects alongside um um understanding how we can develop insight in an agile approach.

So it kind of bridges the gap between for those who collect the data and those who analyze it and put the findings to good use, right? So just to give you the history of it, so um it, it's not, as I said, this is not a new concept, as you can see, it started, you know, in 2014, um there was actually a blog post in um IB M's um hub talking about why it is essential for um for big data success. And so big data is, you know, something that's been around for a good number of years and it's continued to um be, I would say the highlight of what organizations talk about and understanding how you can leverage your data cos you know, data is known as the um the black gold scenario of any organization.

And it's really important that it's something that seen as being um vital in order for companies to succeed and definitely through the pandemic, uh what you tend to see the the number of roles that are associated with data has increased um tremendously over the last year. So um to get an understanding of exactly what it is is first off, you know, understand the best practices of, of what it means. So it it's very much a collaborative um environment um encouraging stakeholders to engage more being aware of, you know, the business requirements processes need to be well defined, you know, you have well defined roles, guidelines, um metrics are, are, are known the tools and you know, the technology out there is very much fast paced.

It's very um in terms of continuous integration and development. So it's about automation and um having a quick level of delivery. So in terms of, as I mentioned before, you know, some of the principles are based on DEV ops and um previous before. But again, you know, it's really important why De Rock has evolved is because it's an agile approach. It's very much where it's self organizing teams, self organizing individuals, having the business involved um throughout the whole process and development is making sure that you develop the minimum viable product uh uh during, you know, the iteration of what you're developing.

So instead of being uh if you looked at data products of the past where somebody would develop something and it disappears, you don't even know what they're doing. Then again, it's, it's having the concept of lean thinking, focus on the efficiencies. Um creating value. Again, it kind of relates to the MVP aspect, uh minimal viable products. Um putting controls in place uh encompassing, you know, code repositories, then taking a lot of the, the learners that have come at the DEV ops around the automation, um setting up a DEV and prod environments and encompassing um C I CD processes. But at the same time, ensure that it's continuously tested and monitored and en to ensure that you have good quality. So, you know, it's better to have quality than quantity, which is a really key area to focus on um which then moves into the roles. Now, occasionally, you know, you will see um different takes on the name and convention of the roles, but essentially, there's four kind of roles that fit within, into this area. And the first being a DEV OPS engineer, um which is very much focused on understanding the infrastructure, um being aware of the different cloud environments.

Uh you know, AWS GCP is your, you know, the name goes, the list goes on um being experiencing DEV OPS because as I said, a lot of the principles from DEV OPS does encompass data ops very much around the automation and so forth. Um The next role is the data engineer and I think it's really important in terms of understanding what that means. And one of the key areas that I always um focus on on data engineers, I mean that that I manage is being able to create data los understanding about schema designs. And that relates to both the historical models which is related with, you know, the I would say the older kind of concepts of data warehousing and and so forth, which is around um dimensional fact models. Understanding how a fact and dimensions work, slow changing dimensions is really important, having advanced SQL skills, which um mean that you you can write your own functions, um advanced coding schools and the most popular languages I would say are Python and Java and Scala is has has been on the rise as well.

And Scala is very similar to Java having a deep understanding of the cloud infrastructure. You're not, I would say data engineers are not expected to um be the person that manages the infrastructure, but they should have um awareness of how it works in terms of um scalability of anything. They um develop data cleansing, um data warehousing again and data lakes. So then we move on to the data scientists now. Um So this tends to be kind of the there tends to be a blurred line between data scientists and data analysts. Now, for me, data scientists develop the algorithms. Um um data analysts may manipulate the output from the algorithms. Um Data scientists define um develop um I would say analytical models, they are domain subject matter experts, they have advanced mathematical skills. Um they can develop their mach machine learning models and they have knowledge of coding languages.

And um the most common being Python, which is associated with Jupiter notebooks for instance. And a lot of data scientists use that as leverage to build their models with. Then finally, I get on to the data analyst and data analysts. Um if we had looked at this role maybe 57 years ago, it was very much focused on being able to manipulate data in terms of delivering insight and having a good understanding about the visualization and so forth and using tools like modern tools nowadays are like Looker Tableau Power BI I um having the ability to develop dashboards and reports and understand the difference between a dashboard and a report.

Um advanced SQL skills um is definitely a must nowadays and having knowledge of coding languages such as Python. Uh and I've put SQL in there because SQL you can, as I've mentioned before, you can develop your own um function procedures. Um And slowly, I, I would say over the last 44 years as such, right, have an understanding of statistics and machine learning, but not to the level of what would be expected by data scientists um and data um cleaning, which is really important. And the thing that I would really like to emphasize that um you can move from a data analyst to data scientist, but you do need to ensure that you can actually code your own models. Um Yes, there's loads of libraries out there. Um But one of the things that always resonates in my mind is that when I was studying for my, my masters in vision Analytics and one of the key areas that I did was around machine learning. And A I and the professor pointed out that yes, you can use leverage these libraries. But how do you know a lot of these things are open source that are actually accurate if you, if you, so if it, if it fails, you won't be able to fix it. You cos you cannot understand the back end of the code. So it's really important that you can understand code. Sorry, I moved back to the wrong screen. I'm just trying to navigate back. Oh, right. Right.

So I'm, I'm coming to the end and I'm just gonna kind of give you a basic overview of what a day ops um infrastructure could look like. Now I've put in quite a lot of different tools, but it doesn't mean you use all of these. But just to give you a nice an idea of things that you can look at in order to build your skill set up because most of these things now are all open um source tools where they offer um a community based o open source access and alongside having like um enterprise access. So it's giving you the the way in. So if we take like um I've split it into like into um six key areas. Um In that first is the data source, you get multiple types of data sources such as like, you know, databases, um API S, you can be leveraging data from um applications like sales force or it could be Google antis or Goo Google A for instance. And then you've got the ingestion and this is how you pour your data into your, into your platform or you know, to whatever you use in terms of your data, local data warehouse. Um And as you can see underneath I've used um illustrated the different um sources available. And a few of these are propriety, but where you don't really get, I would say 12. But you, you know, for instance, you can um sign up to um AWS and have your own account to get um S3 or Google.

Um you can sign up and get your own account. So then um alongside that, the analysis and the analysis side flex back to what I said about data engineers being able to create historical um models which are can be commonly known as um fact tables, dimensional fact tables, et cetera, where you would stage your data to transform it and produce the data models.

You can also um look at how you can, you know, develop your own machine learning models as also. And as you can see, this is just an example, these are examples, this is not, this is not a standard um layout that you may use, you may not use all of these things, but just to give you some insight on the things that are available, then you would, how would you present it to your business?

Now, I know there's always this um talk uh in terms of like, you know, using I'm just gonna use one tool. Sometimes you use multiple tools, it's always based on the on your use case. So any ecosystem for a data platform will always have Excel as always there, it will never disappear, but you might use looker, you might use R, you might use Power BI I, it might be tableau, it could be any tool. It's just understanding that then underpinning this. Um I've used, I've put github here but it could be any repository. This is how you would manage your version. And um it, it allows for remote development. So it's one of the things that's really important, but at the same times, it also allows you to manage and um do the governance. So you've got the data management, the metadata aspects, security, ensuring quality and privacy. OK. So just to wrap up with this. So what's next for? Um you know, in terms of data op opportunities, as I said, it's continued to grow in, it's continued to grow in 2021 but it's, it, it will continue to go beyond that. Um Companies are tend to, to invest more in their, their data and it may necessarily not be called a data off team, but it's a data team regardless and some, some tips.

So if you're interested in getting to this field, I would say brush up on your SQL um learn a coding language. Python is the easiest one. If you've never coded before, I would suggest that um look at open source tools like uh D BT analytics, um airflow, gain knowledge of databases, understand about modeling, understand the difference between A and OLTP. Um brush up on your statistics, uh uh operational research.

I prefer that side to the machine learning aspects. And um for me, because that tends to do with the forecasting, the things you can do because you can even do forecasting in Excel. Um I'm sure people were, a lot of people are not aware of that. Um But it, hopefully it's kind of give you some insight into what the area um encompasses. And, you know, if you've ever got any questions, you can always contact me via linkedin or on Twitter. But that is my um the end of my presentation. And I would love if anyone's got any questions and I can see that the people put stuff in link connect on linkedin. Um Thank you. And I believe that I've got one minute left and hi. Hi, Alexandra. Thank you. Nice to see you here. Hello. OK. Right. So, thank you, everyone. Um I hope you got some insight. I've wanted to do it really quickly, but I'm more than happy to go into more detail about it because I manage um both data analysts, data scientists and um and also data engineers. So, thank you. And um II I will, you know, I will definitely connect with those who've asked me to connect and I'm happy to have conversations with you. Thank you.