Data Modernization: Decoupling code from data
A Guide to Driving Effective Data Modernization in Software Engineering
By: Marie McCormack, Veteran Software Engineering Leader
In today's digital landscape, data is the lifeblood of all operations, and its management is crucial for the success of any business. As technologies evolve, enterprises need to update their data management strategies to remain relevant and efficient. Here are some insights into successful data modernization journeys collected over my 25 years of leading software engineering teams - especially at JP Morgan Chase.
Understanding the Importance of Data Use Cases
Data use cases provide the foundation for all data modernization efforts. Understanding your data and how it will be used dictates the platforms and solutions you deploy for your software engineering team. Be it real-time time-series data used for stock tickers or small configuration data—each requires a unique approach and potentially different platforms.
With the ever-advancing landscape of technologies, data use, consumption, and publication have become more complex. The option of one-size-fits-all solutions no longer exists. The key is understanding the outcome you want to achieve with your data. Before making any architectural decisions, you must first understand the types of data you plan to consume.
The Role of Specialization and Team Skills
Modern software teams need a breadth of skills. One of the most important of these is domain-driven design—an approach to software design that provides a model for organizing domain logic and tying it to business operations.
Good data architecture begins with data modeling. Ensuring your team is adequately equipped with the right tools and training is essential in achieving a successful data modernization journey.
Governance and Standards in Data Management
Effective data modernization requires a careful balance between agile development and the need for standards and governance. Not all data use is equal, and your data architecture should reflect this. Understanding, managing, and storing data based on its unique characteristics and use can simplify your data environment.
Investing in learning and discovery is another crucial factor. It’s okay for teams to experiment, run trials, and sometimes, even fail.
Microservice Patterns and Data Management
Microservice architectures bring their own set of challenges to data management. With each microservice potentially having its own unique data store, there can be risks of data inconsistencies due to eventual consistency and Saga patterns.
In such architectures, it's advisable to only store data specific to each service locally. Furthermore, while you may need various data platforms in your ecosystem, efforts should be made to standardize them as much as possible.
Learning from Past Mistakes and Planning for the Future
Just as in any other area of software development, data modernization requires regular feedback, retrospection, and adjustments. As you continually invest in data modernization, it's crucial to revisit past decisions and validate their effectiveness in your current context.
In conclusion, the most successful data modernization journeys are driven by a clear understanding of desired outcomes. Mastering data use cases, deploying the right team skill sets, applying governance and standardization, and continually revisiting past decisions will set you on the right path. Always remember: never neglect data. The secret to any successful data journey is a deep understanding of your data, its use cases, and its continuous management. Happy data modernizing!
Video Transcription
So I'm Marie mccormack. I have a 25 years experience leading software engineering teams in a variety of, I guess, mostly large enterprise environments. I had a brief dalliance with a smaller organization during the.com boom, but I spent a lot of my career working for JP Morgan Chase.
Um I really consider myself a software engineer, but these days, it's a problem if I'm writing code. So uh my team tolerate my opinions about software pretty nicely. So I'm going to talk to you a little bit, the um about data modernization as a journey. I'm going to talk to you a little bit about what I've learned from my career so far about how to enable that journey for your software engineering teams in the hope that it's useful for you, your teams, your organizations and your sales. Um There is a useful statistic that I will share with you all, which is that you are likely to be 100% more successful on your data modernization journey if you know what you're trying to achieve and clearly I made that statistic up for effect, but that doesn't make it any less true.
Um So I'm going to start this conversation by spending a few minutes talking about how um data use cases and doing. If you like the work that's required to understand the journeys that you plan to use data to support is the foundation of all data modernization efforts. So I would say that uh if you went back, even in 10 years, most software architectures and applications large or small required really some quite basic decisions about what data platforms and solutions you needed to use often. Um And Netflix is the great example, you know, you make one decision about a tool that you're going to use because it was possible at that point to have um one solution which was good enough to meet all of your data consumption and publication needs. Um You may, for example, have data for your platform which is small and trivial that you're using for configuration. You may have data for your platform which is large and complicated and real time time series data, for example, tickers for stocks um and everything in between and really 10 years ago. Even still, it was possible to make just one decision that you could avoid local optimization.
You could enable your organization to have a solid foundation, a shared understanding, transferable skills that would be useful across perhaps an ecosystem at scale. So trading settlement billing account management and that picture has changed the world has moved on. And if you look at um what's driving the decisions that we make. For me, the foundation of all of that is understanding what it is you're trying to use the data for what is the outcome that that data will drive. And for a lot of software engineering teams in enterprise environments, they haven't really got enough discipline in their agile processes around doing the design work that that requires. So I'm going to take a moment just to illustrate that example. So if you, if you were to look at um the way that we use data on web applications, for example, when my team were making decisions about what solutions and platforms we needed to use to host their data and how we were going to cash our data. What requirements that we have for data responsiveness, the vastly outage gives us all some insight into why C DNS are important, how you distribute um data at the age to web applications for performance and mobile experience, for example. And that's the start. So if you want to make good decisions about your data architecture and how to modernize your ecosystem, you have to start by understanding the types of data that you plan to consume. So put most foundation you will make very different decisions about.
For example, how you're holding customer preference data, how you're holding the configuration that you might use to uh set settings for your application versus me, we would hold billing data or you would perhaps hold and stock data. And I think that's the first opportunity for software engineering teams and to do a good job of that introduces two things that are important to understand. One is the concept of some specialization. We talk a lot about building multidisciplinary teams and understanding what that means in terms of breadth and depth. So perhaps, um you know, 5, 10 years ago, we were understanding a business analyst and role that they played in software teams. But now we need to understand actually what the model is for data skills. And that is two things in particular. One is the um the ability to uh do domain driven design and more foundation than that design thinking. So there's a an effort around understanding, I think the skills and expertise that your team will need to acquire. So I'm just going to bring up my slides and illustrate some of that. Um Hopefully you'll be able to see them. No, let me just uh change my chair. OK. OK. I'm just going to stop that for the moment. I think that was quite distracting. I apologize. So I'm going to talk a little bit about what that means.
So when we talk about a journeys across data, if you have um data sets and your design thinking, you've done a journey that perhaps requires you to be able to serve up um real time account information to consumers at the point that they log in and you need to understand uh what's the requirement for speed?
So what's the longest that you can take to serve that data? And that will, in some cases actually drive decision making about whether you need to build cash for that data. One of the really important um aspects of that is understanding we've moved, I think away from initial microservice architectures, we really microservice design teams were building locally optimized solutions to understanding that actually we have to get rid of the one each model.
So if you are requiring to cash account data, it's highly likely in your ecosystem that your ecosystem needs a real time global distributed cache of account data, for example. And that's a really important point to note because if you want to optimize your organization's data modernization journey, the best way to do that is avoid lots of teams doing the same work repeatedly. So in order to do that, there are two or three key areas that I think are really particularly important. One is not all data use is equal. And to understand that you need to have some structure around um problem domains. There's a great um there's a great slide that on the um Thoughtworks website and it talks about problem assumption about understanding what is the problem, how might you solve it and what assumptions importantly are being made. And then finally, how will you test those assumptions?
I mean, that's really the foundation of all software design and architecture start by defining what matters, what is the problem or the outcome that you're trying to solve for. Um And then understanding what the steps you're going to take to test out that hypothesis. So for example, your data size is, you know, 100,000 rows a minute and being published and you want to consume some or all of that information and need to be able to present some of it lazy loaded on a website. Is a good example. How are you going to test that? Do you understand actually what the user experience that you're trying to enable for? That is really importantly in um in regulated industry, you may also have to understand privacy requirements for the data as well as retention reporting and management requirements.
So um particularly understanding in country regulation and that's increasingly becoming something that for me is important as you um work in a data modernization path is building into your architecture, the types of labels and tags for data, the lineage information that you need to be able to make those decisions.
I think there's a there's a kind of forwards uh pattern that would be useful for everyone to share. One is is model standardized, govern and repeat. So the beginning of understanding that all data is equal is then do you have a data model? Do your teams know how to do data modeling? Have they got the tools to do it? Have they been educated and supported on that journey. Um I think that that's the foundation of your data architecture. And, and I think the other piece of this is that there are some inherent challenges with that around you may have um agile thinking which is wonderful and, and teams may prefer not to do all of their work up front, but equally, it's really important to understand that if you don't introduce data modeling, some standardization and govern that you'll essentially always end up with an increasingly complex estate to manage, particularly where you have a microservice architecture that already introduces segmentation expansion and uh more problem surface for you to take care of the next part of that journey around.
Not all data being equal is understanding that the choices that you make for data management are really driven by the skills that your team have. So I touched on domain modeling and that whole piece of domain driven design is a skill that really quite a lot of teams have now. And it's very important that you um I understand how they will communicate that to each other. How does that translate into the contracts between the components in your system that are being built? And that requires a level of precision, which sometimes doesn't feel very agile. And I think one way that that I've found its advantages to convey that to people is that it involves us bang into these problems later. And there's no way to evade the need to have useful conversations about interest system contracts, you should do it upfront and you should write them down. Um Investment in learning and discovery is important. It's actually quite difficult sometimes, particularly as um database products, for example, evolve at pace.
And a team may make a decision based on research that they've done that suggests a particular product or solution might be the right one. When they actually go to test that use case, they may discover some unexpected um limits to that. Um And I think one really good way to do that is to actually give the teams time to do discovery work and, and proof of concept bikes on the platforms that they're choosing, particularly if, as I said earlier, you're always making compromises to some extent, maybe you're choosing a product that's easy to manage that your team have operational experience and that maybe you can consume without having to operate it yourself.
And you have to figure out whether that compromise is, is worth making if in fact that product isn't performing or you're not able to meet the configuration changes that you need for your particular use case. So oftentimes you cannot know all the answers up front and you have to let your teams experiment to find things out and occasionally feel. Um So I would definitely suggest that they all good by transformations, data modernization requires you to give people enough time to do it properly.
And particularly if you're building the foundations of a new platform or solution for your organization. I touched on earlier, the need to avoid local optimization. So you need to break the habits of one each for teams and agile engineering teams. And some of the best engineers like to get on quickly, they may feel that they're being impeded if you force standards and governance upon them. But the reality is if you allow a certain degree of lassitude, you may pay for that later when you discover in fact that you have a highly diversified ecosystem that people find very hard to manage. Um And, and I think that's not trivial when you bear in mind that in regular data requirements may need you to be able to pull out data and report upon it across all of the stores that you data is resident. And so the more places you put data, the more places you have to manage that, report on it and check it for uh personal information and then importantly, keep it secure, it's not cheap to test your architectures out. So it really is worth um I think I'm making some happy compromises around the level of governance that you're going to put in place. So I'm going to move on to talk a little bit about patterns.
So um patterns matter in microservice architectures and the interestingly, some pros and cons of that being, I guess now the norm for the back ends of our systems, one is that some of the anti patterns for data access like the singleton pattern, which in fact, um you know, would be 11 past all of your data essentially.
Um Whilst that was great, it did introduce a breaking change all across your ecosystem when you needed to make changes to that one God object to access to data. So microservice patterns, you make conscious choices about which data needs to be kept local to that microservice.
Quite often, it's a pattern for each microservice to have its own data store. And that's absolutely fine. And there's no problem with that two things that really matter. One. Do you really want to let people choose a different platform for ad to store for every microservice?
I'm going to suggest? Definitely not. Um that's really quite important. And again, back to my earlier point everywhere you're storing data, you may be required to manage that data, archive it back up, restore it. It's really not helpful if you have a million tool sets that you need to use to do that. But the other and really quite important piece about that is about data consistency. So um I think most developers of a certain age and I'll include myself in that category, understood transaction semantics very well. They understood transaction boundaries, they knew what atomic was and why it mattered for the writing of data to stores um in microservice architecture as we talk much more about eventual consistency. Saga patterns, they're actually much, much more complex. Um And if I'm honest, often not done all that well. So there's two things with that. One is only allow microservices to store the data locally that only they need. And that's a really important point which I think is sometimes misunderstood by engineering teams. You, the goal of localized data for a service is not that you have lots and lots of copies of the same data. It's quite the opposite is that data that's only required by that service is kept local to it.
Um And I think that's an important point and one that you should be very thoughtful about when you have needs to have data that's shared and surfaced and cached and presented in a performant way across your architecture. Don't build one for each microservice. The other thing is understanding the different data platforms that will need to exist in your ecosystem. So I touched on configuration for example. So it can be the default now to have that in backed by a repository. Um But you can also have file backing and other things. Some of that data, you know, you don't actually need to store it in a complicated data system files might be fine for that. And there are reasons why that matters because keeping your non business related data isolated from data that they have customer or um commercial data that requires to be predicted can be quite useful. Lets you do different things with it. It liberates you from having commingled concerns around security.
But that solution that you need for config will be very different from that that you need for large data sets like stock market tickers versus web content that you might need to deliver static files, icons, images, that kind of stuff that you need to deliver quickly, which again will be very different from reporting.
So the reality is that really all um applications of any significance or complexity will now have multiple different data tools in the mix. Um As part of that, I think it's really important to understand that restful semantics are not all that for data, they don't talk very well to caching. Um And whilst we're using graph QL, I think much more an industry, it's become more more of a standard use case for data aggregation and also decoupling us from restful data semantics, building caches for that also requires some work. So I would also suggest that as you modernize your data, making sure that your team understand use cases for caching that they have good technical foundations that can help them make good decisions, particularly because caching solutions can themselves be very complex. Um So finally, uh just to take a moment um to kind of talk about what happens if you don't do these things well. And it's a great meme out there for data from Star Trek and never neglect data is is really the point I'd like to make and set aside time to do good analytics, make sure that your knowledge of the data sets that your platforms, solutions and systems need to rely upon is great inside your teams.
Um And that's really the the basics of a data modernization journey that enables software engineering teams and then finally accept that you should regularly go back and inspect the decisions that you've made and the fewer platforms you have in the mix, the easier that will be to do.
Um But recognize that you are require to continually invest in data modernization even once you've done what you consider to be the heavy lifting for your platforms. Um So I'm gonna um wind up there and see that really uh what has enlightened? A lot of my teams is adopting an outcome driven approach to data. Um And hopefully, that would enable all of your teams to thank you very much for listening today.