Leveraging High Performance Object Storage for today's AI, Data Analytics and HPC applications
Madhu Thorat
Software ArchitectLeveraging High-Performance Object Storage for AI, Analytics, and HPC Applications
We live in an era where our digital memories and communications are mostly stored in unseen, yet omnipresent cloud storage systems. Ever wondered how the daily backup of your WhatsApp messages or the photo reminders from past years on your phone are handled? It's high-performance object storage solutions at play. And as Madura, the software architect for IBM’s storage-scale high-performance object storage solution astutely observes, the need for this technology is only growing with the ever-expanding pool of data generated by AI, analytics, and HPC applications.
The Digital Transformation
Take a moment to think back to 2005 compared to the digital landscape in 2013. In just eight years, we witnessed what seemed like a lifetime of technological change, shifting from barely any smartphones to nearly everyone capturing moments on their digital devices. This transformation has fueled new applications in AI and analytics, resulting in a massive increase in data production. The International Data Corporation (IDC) predicts that by 2025, the world will have created a staggering 175 zettabytes of data. Moreover, a majority of this data is unstructured, which doesn't fit into traditional database schemas but includes things like images, messages, and audio files.
Why High-Performance Object Storage?
With the exponential data growth, especially unstructured, comes the challenge of storage. We need systems capable of not only holding vast amounts of data but also providing swift access to it. Enter high-performance object storage. It's a game-changer, offering a scalable, cost-effective means to keep pace with the burgeoning demand for storage and access speed.
Key Features of High-Performance Object Storage Solutions
A typical high-performance object storage (HPO) solution is packed with features designed to address the needs of contemporary AI, analytics, and HPC applications.
- Object Storage Support: Ability to save data as objects.
- High Performance: Quick data access speeds.
- Cost-Effectiveness: More storage capacity with less cost.
- AWS S3 Access: Support for the widely-used S3 protocol for object storage.
- Scalability: Effortless expansion to accommodate growing data volumes.
- Multi-Protocol Data Access, Tiering, and Protection: Access data using various protocols and ensure data longevity and safety.
IBM’s Data Access Services Solution
IBM steps into the picture with its Data Access Services Solution, a component of the IBM Storage Scale family. This solution is founded on the Global Data Platform, which is centered around IBM’s tested product, IBM Storwize. With a commitment to versatility and robust performance, IBM's HPO solution presents the following layered services:
- Data Access Services: Offers high-performance object storage support and S3 access for unstructured data.
- Data Caching Services: Ensures data access independent of the data's physical location.
- Data Management Services: Enhances control, automation, and facilitates data life cycle management.
Furthermore, IBM prides itself on delivering industry-leading performance numbers, making it a front-runner in object storage solutions.
The Relevance of High-Performance Object Storage Today
As we generate more and more data, the significance of high-performance object storage becomes apparent. These systems are not just necessary for their data-holding capabilities but also for their scalability, cost-effectiveness, and support for AWS S3 protocols. The result is a storage system that is not only efficient but also fits comfortably into the modern, cloud-native world.
Madura from IBM concludes that the integration of high-performance object storage solutions is vital in handling the data deluge of our digital world, thus becoming an indispensable tool for enterprises and individuals alike.
Conclusion
In closing, the transition to high-performance object storage systems is not just a trend, but a necessary step in managing the data explosion from today's thriving AI, analytics, and HPC workloads. The future is here, and with IBM's Data Access Services Solution, we are well-equipped to journey through it, knowing our data is effectively stored, accessed, and protected.
Open for Questions
With this comprehensive exploration of high-performance object storage, would you like to know more? Feel free to raise your queries and delve deeper into the world of efficient, scalable, and powerful data storage solutions.
Thank you for staying updated with the latest in data storage solutions. For further inquiries or discussions, do not hesitate to reach out in the comments or through our contact channels.
Video Transcription
I think I can get started. So, hello everyone. Good morning. Good afternoon. My name is Madura and I am currently the software architect for IBM storage scale high performance object storage solution, which is part of IB MS storage scale product.And today I will be talking about leveraging high performance object storage for today's A I analytics and HPC applications. Now, before I go into details, let me ask you a question. So how many of you have seen a notification on your phone early in the morning saying rediscover this day, many of us have seen it, right? And then when you click on that notification, you start seeing a series of photos or sometimes even videos and thus you revive your old memories. Have you wondered about where all these photos are getting stored? Have you thought about where your whatsapp messages are getting backed up every night? So today in this session, we are going to talk about the kind of storage where your data is usually getting stored. So let us have a quick look at the agenda first, we will see why there is need of high-performance object storage and how it's relevant in today's world, then we will have a look at some of the features that are provided by a typical high performance object storage solution.
After that, we will have a quick look at ibm's data access services solution. And towards the end, if we have some time, we will have a quick round of Q and A session. So let us get started. The world is changing. This is a picture taken at Saint Peter's Basilica church in the Vatican City in 2005. And here is a second picture taken at the same place in 2013. Notice the difference here in the second picture, most people are having smartphones or tablets in their hand and they are taking images or pictures or videos. So these two pictures show how rapidly the technology landscape has changed in just over a few years. In fact, it looks like there has been a lifetime of technology landscape changes in just eight years. And this has resulted in the emergence of new applications or apps as we call in the domains of A I and analytics. And all of these applications that we use generate tremendous amount of data right now. For example, when you use an application to capture a picture, the image gets stored somewhere, right? So for this tremendous data, we need big storage which can save your data. Also the recent buzz around G GP T clearly shows the effects of A I in almost every sector and into our day to day life.
Now, if you uh read or watch news or if you just look around yourself, you are likely to see many day to day examples where an A I is getting used. And there are some highly viable examples of some industry sectors where A I is getting used. For example, within the financial market A I is critically important for making market prediction forecasting for fraud detection and so on. And similarly in the retail market. Now, for for example, if you open a shopping app, you will notice that the retailers are taking heavy advantage of A I to maximize the customer experience and doing things like target based advertising or location based advertising. So overall if you observe around yourself, every industry has some applications for which A I has become critical and all these industries while using A R are generating tremendous amount of data. In fact, the I DC that is the international data corporation estimates that by 2025 most of the data in the world will be in the order of 175 gigabytes. That is quite a lot that is tremendous. And it is expected that the amount of data that will be generated in the next three years will be far more than the data that was created in the past 30 years and most of the data will be unstructured.
Now, unstructured data means the kind of data which cannot be stored in your traditional schemas or in databases. Uh For example, uh when you're using Facebook or Twitter or when you post messages or you capture images or save audio files, these on are examples of unstructured data generation and it is estimated that enterprises or corporations are going to have significant amount of unstructured data in the next few years.
So now that we understand how much data is being generated on a day to day basis. And what is the prediction for future? We need to think about storing this tremendous data, right? So what we need is storage systems which can solve two problems. First, the storage system should be able to save a large amount of unstructured data. And second, the storage system should be able to provide access to that data at high speeds. So what is the solution? The answer is high performance object storage systems which can store your unstructured data as objects and provide you access to that data at high speeds, delivering high performance. And one of the major advantages with the object storage systems is that they are available at low cost.
So you save your data or you store your data at cheaper prices. So till now, we have understood why we need a high performance object storage solution. Now let us have a look at the typical features, a high performance object storage solution or a HPO solution should provide so that it can be used in the domains of A I analytics and HPC. So the first one is object storage support. That means this solution should definitely have the ability to save your unstructured or semi structured data as objects. This solution should also be able to deliver high performance, meaning it should be able to allow you to access your data at faster pace. Then this kind of HPO solution should be available at low cost. It should be cheaper to use and it should support AWS S3 access as well. Because today S3 protocol is becoming the de facto standard for accessing object objects in the object storage. And moreover, this kind of HPO solution should support scalability or should be scalable. So that tomorrow if you want to expand your storage, you should be able to add more compute nodes. And in addition to all of these features, the HPO solution should support other features like multi protocol, data access tiring and data protection so that you can access the same data using different protocols like NFS, staar HDFS.
And you should be also able to categorize your data and safeguard your data. Now let us have a look at a sample use case where a data scientist uses AWS three to access A I application data using HP. The image here shows such a scenario where in the middle is a distributed storage system environment in a data center which supports HP. And this kind of distributed storage system environment may get data from anywhere. For example, the data may be coming from devices like camera or sensor or the data may even come from IO intensive applications um like A I applications or HPC applications. And then the data gets stored in the storage system environment. After this, a scientist may then use AWS S3 to access that data at higher speeds. And this is where the HPO solution comes into picture. So the data scientists would access the objects that is your data using S3 protocol. And after processing the data or analyzing the data for a few weeks or months, when it's not so useful anymore, the system administrator of this storage system can move the data to a backup storage system which may be based on take or object. So this is one example or one use case which shows how the data flows from left to right from the left side. The data is is coming in getting interested in the middle is the storage system where the data is getting stored.
And the data scientists would be accessing the storage to uh use the data. And then after the usage of data is done, the data gets moved to a backup storage. So while this was one example of how HP storage is useful, there can be many such scenarios where HP can be used. Now, I would like to talk a bit about IBM storage scale HBO solution which is also officially known as data access services solution. IBM has built a framework called the global data platform based on ibm's well proven product called IBM Stork. And this platform consists of a set of core data services that will help to solve a number of client application problems. These services are as shown here marked as 12 and three. So with the first layer that is the data access services, the high performance object storage support is provided which gives S3 access. Thus, it supports storing unstructured data as objects and provides high performance.
And this high performance object storage solution here can be used along with other protocols like NFS, Samba SDF, ST SI and so on. So this ensures that applications can access the same data using different protocols. For example, a application, one application may write data with NFS and another application may access the same data using S3. Second is the data cashing services which provides data access independent from where the data resides without creating copies of the data. And third is the data management services which provide visibility better control and automation which facilitates the data orchestration. And it also supports data life cycle management. And lastly, this global data platform provides security and cyber resiliency for effective protection and prevention of cyber security attacks.
And it will also help to recover your data in case of an attack. And towards the right of this slide, what we see are the performance numbers for the HBO solution from IBM. Please note that these are one of the best performance numbers provided by a ST solution for object access and very few solutions in the world today give such good performance. So to conclude IBM storage scale, high performance of storage solution or data access services solution meets the demands of storing unstructured data as objects and providing access to data at high speed and low latency. So to conclude what we saw today is that a large amount of data is getting generated worldwide and it is predicted that the data that will be generated over the next three years will be far more than the data that has been generated in the past 30 years. And most of the data that will be generated, it would be unstructured data. So in today's situation, high performance object storage solution becomes important because they provide the ability to store your unstructured data as objects, they can meet the demands of accessing the data at high speed and they provide a scalable solution and high performance object storage solutions support AWS S3 thus making your application suitable for cloud native world.
Also the advantage here is the object storage solutions are in general available at low cost. So using a high performance object storage solution in today's world for many applications and the data that is getting generated worldwide is becoming very necessary. Thank you. So with this, uh I open up uh the session for Q and A? Are there any questions in the audience?