Media Workflows Leveraging Cloud by Anvita Jain

Automatic Summary

Understanding Media Workflows Leveraging Cloud

Welcome! Being a DJ and presently held a senior software engineer position at Netflix, let's venture into an interesting topic, the concept of media workflows leveraging the cloud. This article covers the importance of workflows and pipelines, a delve into architectural implementation of using media workflows and their importance in content creation, film industry prominently.

A Glimpse of Media Production

Everyone enjoys their favorite show or movie on Netflix. But you've ever wondered the lifecycle of your favorite content, the processes it undergoes before eventually making its way onto your screen?

Content creation operates in sequential phases, starting with a pitch before progressing to negotiations, contracts, and eventual production. From there, the content is launched on the service.

Geographical dispersity is also a considerable factor. With production systems globally distributed, the intricacies of film production could have various professionals from around the world working on a single piece. This is where media workflows come into play.

What is a Media Workflow

The media workflow signals the media exchange between different departments or teams. It encapsulates pre-production, production, and post-production activities.

From script-writing in the pre-production phase, actual shooting during production, to the post-production phase, which includes visual effects, editing, and more. Notably, the post-production process is what we would typically associate with media workflows.

Breaking Down Media Workflows

In a globally distributed setup, media workflows aid in smooth collaboration between different teams. They involve several procedures, including:

  1. Offloading Media: Initially, media is collected manually from a camera or sound recorder and carried across different departments. With advancements, processes are done automatically, eliminating manual processes.
  2. Editorial: During the editorial phase, editors decide which aspects of the shot are important and need focus. Filtering out unimportant details allows for a clean, seamless movie or show.
  3. Visual Effects: Movies and shows typically involve visual effects to enhance their appeal. For instance, edits or tweaks to colors, brightness, and more can drastically improve content quality.

Handling such varied operations can be enormously complicated without automated and highly efficient systems. This is where media workflows bring immense value. They infuse SQL, Python, Java, and other modern technologies for seamless operation through media exchange between different departments.

Platform Composition

For effective media workflows, platforms are implemented, each comprised of multiple microservices. The platforms include:

  • Media workflow platforms: These platforms encompass a resource manager, state machine, and a unified user interface that allows users to have a keen insight into the workflow processes.
  • Asset management platforms: They help coordinate assets based on specified schemas, ensuring they are always available and ready to trigger other workflows.
  • Global storage service: This service is responsible for secure data storage, storage validation, and ensuring efficient data delivery.

Conclusion

In the vast realm of content creation, efficient collaboration and management of media is integral. Tools utilizing AWS, Python, Java, JavaScript, and other modern technologies have allowed companies to operate media workflows seamlessly. In the future, these platforms will continue to advance, improving effectiveness and productivity in the media world.

Please feel free to reach out for any questions, or connect with me on LinkedIn for further discussions on this topic.


Video Transcription

Hello, everyone. Thanks for joining my session. I'm a DJ and I'm going to present media workflows, leveraging cloud and you'll see what this actually is gonna cover.Um So just a little background about me, first of all, uh that I'm gonna talk about, then I'll talk about what are the workflows and the pipelines and deep dive into the architectural and how we are implementing and why we are doing what we are doing. And then at the end, I'll probably leave some time to take some questions. So I'm a senior software engineer working at Netflix. Um I work in the studio technology arc. So we typically enable all the creators and the content creators with technology so that they can improve their time and use their resources efficiently so that they can uh they can clearly focus on their creative in depth. And the main technologies that I've used is Cole Java Python, lot of SQL AWS products. Um We use Spinacre for our deployments and a couple of other in-house groups. Now, if you talk about a life cycle of a content, right? So whatever favorite show you see on your, on Netflix, it's not something that's just there like a magic, right? A lot of processing goes behind a couple of shows even take like years to reach to the state where they are.

So how does the whole process come into play? It starts with a pitch. So say, suppose tomorrow you have an idea that you think it could be converted into a show or a series or a movie, someone would come to Netflix and pitch their ideas. After that, our development team takes a look at it. They do some kind of business negotiations and finally come up with the contract saying, OK, this can be formulated into a series of say 15 episodes, right? So those kind of negotiations happen and then comes the actual production state. And after that, it is finally launched on the service. Now, if I zoom in a bit into production state, there are a couple of steps in what some is called pre production, then production and then the post production. So pre-production involves tasks like writing a script or you know, finding the locations where you want to do the shooting. Actual production means you are out there on the sets with the camera and recording your scenes and music and say dubbing uh uh sorry, not dubbing exactly what makes dialogues and then comes the whole post production architecture. So this is where our team, my team comes into play.

So once everything is shot from there, how to make it stream on the service requires a lot of post production processes and that is what I'm gonna deal about today. So now what is a media workflow? A media workflow is a representation of a media exchange between different teams. So for a movie to come together, there could be like a V FX artist who wants to get some visual effects done. For example, uh V FX shot of a demo organs carrying 11 in stranger things. There is nothing called a real demo go, right? It's all visual visual effects. So lights, camera and media. How does it all happen? So for a global company like Netflix, even our production systems are globally distributed. So it can happen that your onset locations are somewhere in New Zealand and then there could be a location in say Mexico. But your editor is sitting in the Toronto and your sound artist and picture of finishing people are somewhere in India followed by the V FX artists in India. So now all these people have to collaborate together. Whereas our Netflix creative team is sitting in Los Angeles to figure out that OK, I like the shot. This is the final cut and it is a so what happens in these kind of scenarios, right? These are all live action productions.

So first of all, a media is offloaded from a camera and the sound recorders which are on the set, someone would copy over files between this and take it to different departments. So earlier, it used to be like a manual process. There will be actually a person who will be carrying your huge tapes of recording from say one country to another country or one place to another place. So we wanted to automate that whole process. And in this entire process, there are like terabytes of data that is produced and uh different partners across the globe interact with them. So we wanted to come up with a globally distributed processing using the pipelines. Don't get scared with this diagram. I know it's quite intense, but uh we'll zoom into each one of the tasks. So when I talk about media workflows, there are several things that happen. So first few things would happen at the onset location or near set. So when I say near set, just imagine if Narcos is being shot in the jungles of Mexico, there might not be enough bandwidth or there might not be enough internet or manpower to download the information from camera, upload it somewhere into the cloud and no one can work. So either they would do it exactly on set or they will have a nearby locations where they would do that kind of stuff. Second comes the editorial.

Now, on the editorial side, what we see is given that say one hour of tape out of that one hour, only a small portion is what editors should be looking at because there could be someone clapping the window, there could be some, you know, spot boys running around and that was recorded.

So those kind of things that are done at the editorial site. After that, there is a lot of like visual effects that are applied and there could be some sound and music kind of things that could be needed, followed by picture finishing. So sometimes the light on the set is not sufficient, you want to enhance some colors or you know, add more brightness. So all that happens at a picture finishing facility. And finally, it comes down again to Netflix where someone would do localization in different languages, they will do dubbing, subtitling, et cetera and finally make it stream and also archive it. So now I'll get into one of these, right, what happens when we have to collect camera from the media? So for that, we build a U I as well as some API. So a lot of companies who are actually doing these kind of productions are very technology friendly. So they would have like a technical uh department who would write certain scripts or you know, certain programs running on Linux machines. So for them, we have provided uh external API for through which they can connect and start sending information or their data or the files into our ecosystem or the second way we give them by creating a U I tool called content hub.

If you just Google for it, you would be able to see it as well and using that tool, they can import certain different types of medias. Now, second part is the editorial part. So we have the media once OK. And the first step, second step is how to conform it to the editorial workflow. So typically what happens is there is uh EDL which is called editorial decision list. It's a timeline F so it will have time code saying at time 000.1 the cut starts and it runs for the next 20 seconds. After that, there is a 10 minute break and then another shot comes in and those kind of information, it is uh guided by a open uh IO standard, open timeline IO standard. And using that, we produce a uh editorial decision list, not reproduce, but usually the vendor would give it to us. Once that is done, we have processes in place to match those time codes with exact frames that we have got. And when I say frames, it would be any kind of file that we got. It could be just a plain image sequence or it could be like a video or MOV or MP four.

And once that is all done, uh we would also have to look at some kind of transcoding because to interact between different systems, there is some kind of transcoding needed. And then finally, a V FX plate or a V FX facility will actually get all the V FX plates, for example, if there are any involved. So same kind of process could span across different kind of assets. It could be V FX assets, it could be just raw camera footage, et cetera. So similar process happens for the conform pool. So when I say the term conform pools, it means that once you extract a video out of a camera after that, how you match and trim it across the entire tape, so that you match exact time codes and you send only the relevant information to the next uh component in the pipeline, which could be like your editor who would be interested only in those shots.

And on top of that, they will start doing whatever cutting or you know, additional review they want to do. Once this is all trimmed, they also send it to a picture finishing facility for reviewing which will say, oh, the colors are not good or I want to increase the lightness or someone would say the dress of the actor, the color of the dress of the actor is not bright enough. Let's change the color. So all these kind of things start happening here. Now once these things happen, there's a heavy need of visual effects. So I'll give you very basic examples of when visual effects come into picture. Say suppose you're watching a scene where people are communicating over chat on their whatsapp or on their phone and you see those chat pop up bubbles. Popping up all over the screen. So those things are basically visual effects now to produce visual effects, we have several workflows.

So one workflow is to get a request from an editorial department and iterate over everything that is produced on V FX uh charts and do certain kind of reviews, deliver it back to say uh a finished product after the review is done to a V FX short delivery process and then maybe archive it because say suppose you have a scene in season one that might end up being used again in season four.

So there is a heavy need of everything that is uh short is also archived or every visual effect that is worked upon is also archived. Now, once that is done, we have another uh few workflows that come into picture. One is called media review, for example. So in media review, what we do is we work with several different vendors. So just to create demo Gorgon for stranger things, we could have worked with, say three different vendors ask them to submit, you know, different versions of demo Gorgons to us. And someone from the Netflix Creative Team would confirm that OK, this is the final version that we want to go with. And these kind of things happen with a heavy integration with a lot of third party creative tools such as short grid, which is a product by Autodesk and it is heavily used in the Creative Review. Processes. Um I covered the X short delivery and also the studio archival a bit in my previous site. So now let me talk about the picture finishing part. So how does picture finishing actually happen? Right. You need certain ingredients or some recipe to generate a particular way. So suppose if there is a movie which is entirely shot in night or, you know, just like one night corresponding to an entire movie, so typically they won't end up shooting everything in the night.

They would be doing these kind of shootings at stages or you know, at certain set locations, but we have a flight and then there could be certain filters, it's almost similar to you are putting a picture up on Instagram and then you want to put a filter on it, say Paris or L A.

So there are, these are kind of different filters that you can put, right? Um So once that is done, uh then we do the timeline media is verified and exact points are extracted that out of say one hour of video, only five minutes worth of video needs some kind of color correction, for example. So these kind of uh transitions happen in the picture, finishing work flow side of things. And then there could be a V FX plate related deliveries available. So just imagine if you see and um you know, especially during COVID times, right, they could not shoot on real location.

So it would be all on set. Or nowadays there is a huge demand for virtual production. So what happened is you will have a big screen say suppose it's having a background traffic and an artist would just stand ahead of it and do some acting or the artist might even just sit at their home, do some acting and later someone will merge the so this background plate which was just giving you a like a scene of a busy street is like one plate on which you will overlay an artist.

So all those kind of things could happen through our pipelines. Similarly, there are things that would happen for sound. So it would be like, you know, generating audio stem separately, generating some mixes doing dubbing. Um There's a lot of machine learning kind of thing involved to generate dubs and subtitles before a manual person would go and do a QC. So all these processes have been starting to get more and more automated in the industry. Now. Now, uh that was a lot to take on the media side, right? But if you see technically this all cannot be done by like one single monolith system. So we came up with different systems behind with, behind each of these systems are microservices in place. So to start with, if you see our first exchange about how we get the media, right? It could be by a partner, our API or it could be using a U I tool called us content hub. And we behind the scenes, we use graph QL services to interact with our other downstream processes on microservices that can do some kind of say authentication like about who is allowed to access what or what kind of processes are allowed on certain type of assets beneath this.

We have created a platform called as media workflow platforms or pipeline as a platform and people can just come write their pipeline and start running. They don't have to worry about scale. They don't have to worry about scheduling. They don't have to worry about resource management seeking an instance in Aws running all that is managed by our platform. Beneath that we also interact a lot with our encoding and inspection teams. So what happens basically is once a shot is short, it needs to be encoded so that the quality remains correct on a tool like uh on say suppose your iphone or if if you are having a small Android phone in a low bandwidth internet connectivity area. So you might have noticed that the quality of Netflix content that you see always goes up and down with the kind of device you are in, but that quality is still good quality. So this happens because we generate several encodes for different kind of devices. We don't try to use an encode for a four K TV on like a very bare minimum smartphone. And then we also have an asset management platform, which basically what it does is it takes up all our files and we try, we have certain schema so we create assets against certain schema.

So tomorrow if I have to go look up at footage and in trigger some other workflow based on availability of footage, I can tie it all together using my my asset management platform and underlying. We have a huge global storage service that takes care of what data gets pushed into AWS. What are, how to secure an S3 bucket, how to store things, how to make sure you do a check sum and validate that. What we delivered in AWS is exactly what we were supposed to deliver. And then there are other teams with which we interact are like Netflix's internal platform tools and data science platform to run a lot of analytics on top of it. Now, how is the whole pipeline composed together? So every pipeline is in a manner a state machine running on top of a process orchestrator and has certain resources that are being managed on it. So when I say a state machine, it means that when I trigger a pipeline or a workflow, it will be first of all say in a draft state, someone would submit it, giving all the inputs that are supposed to go in. Once that happens, we turn the state to say in progress and we start processing each of the node in that workflow uh as part of that process, sometimes the pipeline could fail and we can mark the state as failed.

So on the U I, the user would know that the pipeline is in a failed state. Similarly, we can say that hey, your uh data that you provided was not sufficient, please redeliver. So we can change the state to redelivery and all these kind of states are visible to the end user via RU I two. And at the same time, we have a resource manager. So for a pipeline, there could be say five FX charts that are attached as a resource and the state of resource could change throughout the product uh throughout the pipeline process. So at some point, it could be just a simple file after a couple of minutes, it could turn into an asset after a couple of minutes, it could, it could turn into an external file being delivered to someone else. So we keep track of all those stages as part of our resource manager process and to run everything, of course, we need an orchestrator, an orchestrator is a process which would make sure that everything gets scheduled, it gets processed, calling to different services and things like that.

So for that, we use conductor. Conductor is a Netflix's open source orchestrating system. And it allows you to come up with like different kind of uh you know workflow definitions. It gives certain in built tasks that you can use and then it takes care of scheduling and queuing everything. So what are the properties of a media workflow or a pipeline? Right? It it has its own unique and persistent entity. It can be started manually or automatically as a trigger based on one pipeline. If certain pipelines run for like minutes, two seconds and some can even run for months or years depending on the kind of work they are supposed to do. All the execution happens as part of a predefined set of nodes which could be sequential at certain stages, there could be some parallel computing involved as well. And if you look at a high level, it would look like a graph. So how multiple nodes are a directed graph in certain manner, uh it can interact with other systems that I covered about like you know how our entire ecosystem is made for, we can have certain await tasks in it. I can wait for, you know, generation on one kind of acid. And until that happens, my pipeline is in a pause state.

As soon as that happens, it releases a lock or it somehow informs my pipeline saying I'm ready for you to review once that is done, you wait for um you know its own states and data. It can be uh queried at any point and then republished several events also. Now I'll dig in into how our process orchestrator or a conductor is composed and how we use it. So at a high level if we talk right, like uh of course, it needs some kind of storage to store all the definitions. But on top of that, it gives you certain services. So one of the services is called workflow service. A workflow is a composition of several tasks. So using workflow service, you can create a workflow, you can start a workflow, you can stop a workflow, you can retry, uh you can fail a workflow, uh define the workflow definitions and things like that. Similarly, there are task services.

So using a task service, you can define what would be my retry strategy on a task. So if suppose a task fails in the first time attempt, do you retry for a few times or do you um you know, can you like restart after a couple of minutes? What is like exponential back off kickoff policy that you would use before retrying it? And those kind of things, then it is accompanied by a decider service and a queuing service. So using the decider service and queuing service, you can also define certain priorities and then conduct will queue your tasks accordingly. So the end user like us, we don't have to worry about, oh, how would my, you know, hundreds of instances would be used? How does all my tasks get um allocated and how do they run? So that, so basically for an end user, you don't have to worry about that kind of uh granularity about how your distributed computing is gonna happen. So at a high level um a workflow can be defined using a JSON blueprint um which obviously can be interacted using a workflow API uh and some metadata api to define the blueprints, um you can have abilities to like resume restart process and there is a user interface also involved.

So you can always see the progress of your workflow. It, it clearly shows what are the inputs, what are the outputs? What are the logs if it fails in the first attempt, it will tell you like first attempt failed, what were the logs for the first attempt? Now, if it is retrying, you can also see that kind of information on the U I, you can um synchronously or asynchronously process things, which obviously is a great deal because sometimes you want your workflow to kick off certain process and you don't want a workflow to wait for that and you can just catch that result later at some point in the workflow.

So all those kind of event handling and uh that can be done using a combination of the event handler that we build as part of a pipeline platform as well as the conductor's ability to asynchronously process things. Um We have been running like millions of workflows still late uh concurrently, a lot of them. Um And it it's been a seamless experience for the end user. Uh It's all backed by a queuing service. So everything is attracted from the client and we don't have to worry about how does your task get scheduled. So, one thing is of course, because now then it's like indeterminate behavior for client user, right? So if I want my task to get picked up right away, but if for some reason, there's a huge load going on, I might have to wait. Um And the best thing is it does allow a lot of communication over GDP or GR PC. So a lot of our new services are now moving with GR PC. So our interaction still remain seamless. And so now what are the technologies that we have been using? So um a lot of technologies that we have used uh like of course conductor that I've talked about, we use Python, we use Co Lane, we use Espera connect from IBM for doing a lot of data transfers.

So when someone has to upload a file, so it creates a session for you and then it takes care of getting the uploads and the downloads done, uh we use a lot of java or U I uses javascript. Uh everything, every service into our system is a microservice. So we do not have any kind of monolith uh architecture and we try keeping everything separate as much as possible. Um And we use pinnacle for interacting with our AWS instances and taking care of our deployments on the build cycles. So um as part of this presentation, I would definitely like to give credits to my entire team and all the partner teams that we have worked with. Uh So the video engineering team, our media workflow team, studio, technology, art, um and any mission that I use in this uh from Sty said and slides school.com. Um And that is it. So if there are any questions, I would be happy to take it now or if you feel free to connect with me over uh linkedin and I would be also happy to chat with you if you are more interested in anything. Thank you. Everyone who joined the session.