How we built a Personalized Ecosystem to Create Pure Joy for 100+ Million PlayStation Users
Video Transcription
And thank you so much for joining me for the stock today.So I'll just start with my presentation, but before that, I'll just talk about the title So I'm going to talk about how we built a personalized ecosystem to create joy for the 100,000,000 plus PlayStation users. And before I start this session, I would definitely like to give you a brief introduction about myself. So my name is. I am currently working at Sony Interactive Entertainment as a senior software engineer. I've been with this company for more than 5 years now. This is basically my 6th year at SIE. So I was one of the earliest engineers the personalization team. So this team was actually founded a couple of years before PS Five launch, and I was one of the early engineers who joined this team.
And then we worked on developing features specifically related to personalization for PS 5 consoles and then also making it backward compatible with PS 4. And, on my day to day basis, the focus of my role is back in engineering and data engineering. So I work on building microservices and data analytics platforms, which can be real time. And sometimes I also work on building batch processing systems. So that's all about me. But now I'll go over the agenda of my presentation today. So first of all, I will talk about what is personalization and, why do we personalize? What is the goal? And then we'll go over some of the real use cases where you see the personalized content today on, of course, the PlayStation consoles. Then I'll go into the actual separate components of what all goes into this. So on a high level, it looks like just, machine learning models.
But when you go through the actual flow, you'll see that we look at a lot of the other factors like user's privacy, the segments, And then we also gather data for users which can be real time or which can also be gathered and aggregated in our batch batch processing systems.
Then, of course, we have machine learning models. And on top of that, we have experimentation platforms. So I'll go into each of these components. And finally, I'll talk about the a little bit about the challenges of personalization. What are we doing now that we have built this? And then I'll summarize my talk. So before I begin the talk, I would definitely encourage you all to put all your interesting questions in the chat. And I would definitely love to take them up and discuss with you during the conference or even afterwards, I would love to connect with you. Alright. So let's begin. So what is personalization? So basically the goal in personalization is that we are trying to tailor content for each and every user based on the user's behavior. The user behavior or even the user's past interactions with our platform.
And then there are some other factors like demographics which kind of drive the decisions on what we should actually show to the user. So ideally, we are providing for user interactions that resonate for the user specific preferences. But, I've been talking about this word called content. And now I want to get into what is this content? So, basically, this content can be as simple as advertisements, or recommendations that we see on so many platforms like, the platforms for streaming, music or streaming. Any latest shows Similarly, we also have recommendations on gaming consoles. Apart from that, we also have, content which can be promotions. So for example, if there is a new game coming up, this game wants to run a promotion to promote their game to some specific users, That's another type of content we display on the consoles.
So all this content can be personalized based on the requirements. And that is where we apply the concept of personalization. And next, moving on to the goal of why do we personalize? What is the end goal that we're trying to accomplish here? So ideally personalization is supposed to improve user experience. Basically, we want that, when a user comes on the console or logs into the console, they should spend most of their time in getting the most out of the console. So they want to play new games. They should get amazing great recommendations on what they can play next or what they can buy next so that they don't have to spend too much time in thinking or browsing through thousands of game titles.
So we try to personalize SUEZ to improve the user experience and, of course, increase the engagement of our users, their interaction with our platform, and overall giving them irrelevant content. And the end result of this would be that it leads to higher retention. So for example, if the user logs into the console today and they have a great experience, they will definitely come back again. So that's how we try to create joy for our users and retain them. So that's what, is the goal of personalization. Now next, I'll move on to some of the actual examples of where do you see this personalized content? So for example, in this screenshot, you can see There are some recommendations and the top title of this row, it says because you played hitman 3.
So based on, what you played in the past, We are generating a special list of suggestions for you, and this is completely personalized. So if you if you check some other person's console, they might be getting completely different, recommendations in their first show based on the games probably they played. And then there are some other factors like the games that your friends played or some games which are specifically for you because you have some active subscriptions. So for example, you might be a member of ES plus subscription, which can provide you some free games or some great games at better prices. Will provide you the recommendations according to your particular profile. And next, moving on to this tab of what's hot where you see some, again, some tiles, some boxes with some content, and there is one content which says sponsored.
The the Fortnite game. This is another place where we personalize the user experience where we can provide advertisements and we can run some campaigns to give personalized advertisements or promotions to the users based on certain factors, which I'll talk about later. Another factor which, we need to look into is the ranking of content. So here you see in the screenshot, we we have because you played the last of us part 2, there is a bunch of recommendations And the other side of, the recommendations that you see here is that they are ranked and the ranking has been performed by the machine learning models behind the scenes.
So if there are, say, 1000 game titles, which I can recommend to you, I have first narrowed them down to probably 500 titles. Because you played the last of us part 2. But out of those 500 titles, if I can only display 5 on the screen, on the top row, what should be those 1st 5? So those are the things which are computed by the machine learning models, and they rank these recommendations, and they generate the most relevant recommendations which we eventually display to the user. So that's basically all that you see in terms of personalization. There's another personalization, which is very interesting that, you cannot really see on the screen. But this is another feature that we have been providing to the users of PS Five consoles, especially this is the personalization of game patch updates.
So in this personalization feature, we are sending game patch updates to your PS 5 console. Specifically for those users who have enabled internet connection in rest mode. So we don't want to send you game patches while you're playing games. And, you know, kind of, becoming an obstacle in your in your playing experience. So for that reason, we want to send you game patch updates when you're not playing games, but your system is at rest mode and your internet is enabled. So this is that feature which helps us accomplish this kind of personalization for sending updates. So now we looked into so many use cases where this whole personalization factor is coming into place. But now I'll go a bit deeper into what all goes behind the scenes into generating this. So there's a bunch of things I've mentioned here. There's privacy settings.
There's segmentation analytics and processing systems, machine learning models, and experimentation. So first, I'll get into user's privacy settings. So this is a screenshot again from PS 5 console where you have a tab called privacy under users and accounts. So the very first before we even think about personalizing your experience that we take into consideration is privacy. So if you specifically go and you know, control this of privacy setting and you disable it because you don't want to see personalized recommendations. Then we will not be personalizing your experience at all. You will get a generic set of recommendations or, you know, some generic set of recommendations specifically for your country or for your region, and sometimes it could be very generic.
It won't have anything to do even with your country. So I would say the very first filter which drives what you see on the console is definitely your privacy because we definitely like to align with all the Compliances like GDPR, which are very important for us to make sure that the users feel safe and secure on our environment.
Now the next level of filtering, is audience segmentation. So segmentation is a marketing strategy. It's a concept in which you identify different subgroups within the target audience and you try to deliver more aligned messages or content to these specific groups. So basically, if I think of my audience as all people who played games on PlayStation console. I can further, you know, there there'll be millions of people in that, audience, but I can narrow this down. For example, the very easy, segment that I can think of is segments based on gender, males, females, segments based on age, the users who are from the age group, 21 to 25, or users from the age group of 25 to 45. These are some examples of simple segments, but we can construct very complex segments as well.
Segments are another filter which help us to understand whether some content should be sent to some user or not. So for example, if we are specifically targeting users from age twenty one to twenty five with some particular game or promotion, then Only those users who fall in this segment will get to see that content and others will not see that content on their screen. Another big part of segmentation, which is very useful for our, use case in PlayStation is understanding what you already own. So for example, if you already own a game like last of 1st part 2, And if I still give you a recommendation of that game, there's no way you are going to buy that again. So this ownership of a game can also be thought of as a segment. Users who already own, last of us part 2 and users who do not own last of us part 2.
So I I would definitely want to utilize an opportunity to give you a better recommendation which can be games which are similar to last of us are 2, or maybe if there is a next version of that game, but, instead of giving recommendation of the games that you already own, I would try to personalize this experience, but I'll give you some other recommendations.
So ownership of games can also be thought of as an, you know, audience segment, and then there can be 1000 and 1000 of audience segments accordingly. Before I move on to this, particular slide, I would definitely like to also mention about another filter, which is age. So there are certain games that we cannot show to users below certain age. So that is also one very important consideration that we make before deciding what to show you in the end. So if there are certain games which cannot be shown to users below 21, 22, or Whatever is the minimum age requirement for a game, we definitely make sure that we apply that filter. Next moving on to the data analytics.
So while all this thing is going things are going on when you switch on your console, we are checking for the requests that comes in for the content and we kind of apply all these filters of privacy, segmentation, game ownership, then there's one more thing that we do in parallel, which is to gather some data from your request.
And, this data can be gathered in real time so as to provide some insights to the users in real time. For example, some of this metadata is used by other services to make decisions in real time. Whether a campaign should be provided to you or should be shown to you or not. Similarly, we also kind of, have systems to aggregate data and put that in some other databases for historical purposes. Sometimes this data is being used by machine learning models for training purposes. And sometimes, we also run batch processes to precompute some of the things which should not be which do not need to be commute computed in real time while the transaction is happening.
So those are the things that we accomplished with batch processing systems. Here, I've put just a high level overview of what kind of technologies we use, you know, just a simple flow But if you look at it, once the request comes in, we have continuous data coming in, and we consume these streams of events through, Apache Kafka, which is very commonly used, as a messaging system.
And then we have frameworks like Apache Flink, which are used to do furthermore pre processing of this data, maybe add some additional metadata to this data. And then eventually, we can put that to different databases. Here, for example, I've given an example of an analytics database, which can be, which is basically a time streams database. It can be AWS or it can even be Apache drew it. But the idea is that we consume data in real time, and then we put some the analytics database. And from there, it can be consumed by different, either different components or different other, systems. And also it helps us to power our real time analytics dashboards.
So that we can look in real time whether we are delivering the right content or not and how well this content is performing. And as far as the batch processing systems are concerned, we are basically aggregating data either hourly or sometimes daily or sometimes weekly. And then putting that into our lakes, which is basically as 3 buckets. And then further, this data can provide us a lot of insights for example, if I want to understand how a campaign performed this month versus last month or this week versus last week. That's a very, like, big example where we like to use the historical data, which we are kind of saving in our s 3 buckets. Moving on to the next component, of course, the very important part of personalization, which is machine learning, So after all these things are done, the filtration of, you know, extra things and then also gathering data, meaningful data from the user request, we then further pass it on to our machine learning models.
And our models are doing some really important stuff basically, they're ranking these recommendations. Like I said, if there is a list of 500 recommendations and I want to know What is the 20 most relevant recommendations that I can show to user a or user b? That's where the machine learning models come in. And they rank these recommendations. And the highest ranked recommendations would be then shown to the user. But behind the scenes, all this massive amount of data that we are collecting for each user, whether it's about, you know, some metrics like their clicks and views, whether it's about their purchase transactions. This data is what we use for training our models for training our systems and understanding what is best for the user. So of course, sir, we have a variety of techniques in machine learning models to name a few.
We have collaborative filtering, random forest models, which are basically used to build certain important machine learning models in the company. But the overall idea of our machine learning models is that we want to optimize the user experience, but at the same time, there are so many challenges that the ML models have to overcome which can be, to minimize the bias, which can be to ensure that we, you know, stay fair and the customer confidence is maintained.
We are not breaking the customer trust. There's nothing, hidden from the customer. And, another big factor is the new users because when the new users join the platform, we don't have a lot of, new user data. So how to generate relevant recommendations for the new users so that they also feel engaged and they also feel like, coming back to the console. Is another problem that we're continuously solving, working on. But, yeah, that's pretty much what we have in machine learning models. Now moving on to the experimentation side, which is what I call here as AB experiments. So we have a dedicated experimentation platform. But before I get into experimentation, I just wanted to talk a little bit about the metrics So all this work is going on where we are thinking that, oh, we are, you know, providing joy to our customers, and we are trying to increase the engagement of the customers, but how do we know all these things?
So of course, the key part of any system in the back end has to be the metrics. So whether it's very simple metrics like understanding the clicks that we are getting on some content or the views, we are getting or or the purchases that are actually happening for the games that we are recommending or we are advertising. All these metrics are collected and then analyzed in different systems. So another aspect of this entire flow that you saw here is experimentation, which is also kind of helping us to measure certain things. So for experimentation in in simple words, it's just helping us to identify one alternative over the like, whether one alternative is better over the other or not, whether we should go with this new campaign or experience or not. But, if I go into a little bit of detail, in in, like, simple words, if I explain what are we trying to measure or what are we trying to see in experimentation platforms?
So in this picture, you see, like, there is an audience And this audience has been divided into 2 groups, and each group has 50%. So 50% audience in group a, which is called control group. And fifty person audience on the right side on my right side, in group b, which is here, it's called treatment group. So basically, this is what the experimentation looks like. So the control group is usually the usual experience that you're already getting. Or the usual actual content that you're already getting. And treatment group is there. I have provided the new experience to 50% of the other users. And now what we are going to compare between these two groups, group a and group b is, something called as conversion.
So if you see here, group a, the ones who got the usual experience, the conversion was 30%. While the group D, which has got the new treatment, the new experience, their conversion is 50%. So by conversion, we are trying to say that we achieved the desired result. So a conversion is a measure which basically checks the impact of a campaign on increasing a desired action. For example, if I show you a sign up page and you actually end up signing up, we call it a conversion. If I want you to buy a game and you actually end up buying that game because of my recommendation, we basically call it a conversion and that conversion is attributed to the recommendation that I give to you.
So the higher the conversion, the better it is, So in this case, since Ruby got a higher conversion, we would most likely go with the new experience. But it but if it was reversed, If the new experience was not generating positive results, we would continue supporting the older experience. So that's like a general concept of, experimentation But experimentation is in in our systems, we enable experimentation at different layers. For example, one of the examples I've put here is incremental lift study where we are trying to understand the difference between conversions that happened with remarketing campaigns and those that would have happened anyway. So we are enabling experiments at the campaign level trying to see how one campaign performs versus the other. But at the same time, we are also, enabling experimentation on machine learning level because we are developing so many newer machine learning models.
But how do we know whether our machine learning models are better? The newer ones are better than the previous ones or not. So we enable experimentation even at the model level to understand what is the performance with model a versus mid with the previous model. So experimentation definitely plays a very important role in our entire ecosystem in terms of generating metrics, understanding what is better, and what is not. And it basically helps us to generate this process of continuous learning about the users. Next, yeah, so this is basically, the high level flow of what all things I explained here. So as soon as the request comes in, the back end would do a bunch of things like authenticating that request, checking the privacy of accounts, checking for user based segments and, if you own any games and any age restrictions, And then it further passes it on to other diff different systems, which can power recommendations or ads.
But behind the scenes, for personalization, all these systems talk to machine learning models and get the ranked results and pass it back to the client. Now moving on to the section of the challenges and personalization. So There are, a bunch of challenges. Of course, it looks like a mature field and looks like a very mature system because so many other companies are also working in this field already from so many years. But, we still always have ongoing challenges in this field. The first thing, as you can see, it's a very data driven system. It's a very data driven ecosystem. So the quality of your data, the accuracy of your data, and whether you have enough data about a user or not is actually very important. So the quality and accuracy is important for the entire system, for our analytic system, and also for our machine learning system.
Another thing is like how easy it is to get the data frequently. For example, if you're buying something on our digital platform, will get to know about your transaction. But there is another section of people who buy physical disks. And, that kind of transaction does not make into our systems immediately. So, of course, like any at any given point of time, there might be some lags in us getting your most accurate data. So those are the kind of things which definitely affect results sometimes of what we are recommending to you. And the other big problem is the cold start problem, which I talked about earlier, that we have very limited data on new users. So That's one area where we are continuously experimenting with new models.
What can be what can be generated, which is diverse and which can also, you know, engage the new users in a better manner. The next is evaluation and optimization. Of course, we are doing so many computations behind the scenes. But if you think about it, this entire flow of the request as soon as you switch on your console going all the way to the models and then coming back, It will take some time. But, if you are noticing a lag as soon as you switch on the console, it takes few minutes to load and to show you the content. It will be a very bad experience for you even if it's personalized content. So the latency of the system has to be very low. The response time has to be super quick. So quick that it should not be noticeable to human eyes.
So for that purpose, we have to continuously work on improving the performance of these systems every time we add a new layer, every time we add a new component to this, you know, complex ecosystem, we have to make sure that the latencies don't get affected negatively. The system still response, in such a manner that the user does not notice what is that there is a lag or there's a gap in getting the content. And another aspect is, of course, scalability. So we have millions of users. And during the active seasons, like holiday season, and there's a lot of people playing games. There's also a lot of people purchasing consoles during, say, Thanksgiving. We definitely see a lot of spikes in the traffic and making sure that we scale accordingly while, you know, providing content to the users and making sure that the users get to purchase make the purchase and the transactions smoothly without any obstacles without any, you know, blockages is of utmost importance for us.
So we are continuously, measuring our latencies and making sure that we are improving the performance of this entire system. Some other, factors which are still big challenges for us, for example, algorithm bias or filter bubble. Filter bubble is where, if you're just based basing your, you know, information based basing your information on the past, information of the user, you can actually expose users to content which reinforces their existing beliefs. So the diversity or providing diverse information to the users could get, affected. And the other side is when you train your model with certain kind of data and the same kind of data, you can generate bias in your algorithms. So making sure we provide a diverse set of recommendations and there is minimal bias in those recommendations is also another factor that we are continuously working on learning from, you know, these metrics.
So these are, like, some of the challenges, but, of course, I would like to summarize this whole things that we've discussed so far. So basically, if you think of personalization, it it is helping the users to discover new games and enjoy those games, which they might not have found otherwise or which they might have taken very long time to find. So the whole idea is that when the user comes to play games or, you know, visits the console, we want to make sure that the user gets the most out of their time on the console instead of getting frustrated, finding what to play next. What can I do next? So the discoverability of new games is kind of made easier with this mechanism. Then, of course, personalization is not just for one individual. There are multiplayer games and communities of games. So providing, features and recommendations based on the shared interests of community. For example, I talked about, providing recommendations based on what your friends have been playing.
So it's a way for us to keep the community connected and also make sure we get the recommendations for the shared interests of people. Next is, of course, we are trying to make sure we attract more people, but, of course, the people who are already on the platform. It's very important for us to keep them engaged, keep them happy, and we make sure we retain them so that they come back. And, lastly, we are promoting relevant content, purchases, subscriptions, based on user preferences. So we are basically trying to enhance the user engagement and this whole concept of personalization. So, that's all that I have for this presentation. I hope, you enjoyed this presentation.