Unlocking ML Experimentation by Providing the “Accessible Luxury” of Data by Zhichun Li

Automatic Summary

Unleash the Power of Quality Training Data with Scale Rapid

Hello, I'm Z Chen Lee, General Manager at Scale Rapid, our self-serve labeling product that aims to provide high quality training data to democratize access to quality data for companies of all sizes. With Scale Rapid, ML engineers at any stage of the ML life cycle can obtain the necessary data without a large financial commitment.

The Need for Quality Training Data

While working at Scale, we noticed a gap in the market for providing researchers in early-stage start-ups with reliable options for acquiring training data. The only options appeared to be high-commitment, high-cost solutions or makeshift in-house tooling services. The issue was even prevalent within enterprise companies, where the ML experimentation process often hits speedbumps due to lack of quality data.

The urgency of obtaining high-quality labeled data has kept many teams and companies waiting for weeks or even months to develop their models. The barrier to ML is massive, often impeded by slow feedback turnaround and difficulties iterating on instruction sets, which may result in poor labeling and model performance.

Introducing the Solution: Scale Rapid

Enter Scale Rapid. Built to eliminate these obstacles and offer more control over the labeling workflow, we believe we can reduce the waiting time from weeks to merely hours or days. This expedited process allows for faster AI development and brings you the fastest production-quality data without minimum commitments or contracts.

What Does Scale Rapid Offer?

  • Annotation Flexibility: Scale Rapid supports varied annotation formats - from image, text, audio, video, to 3D Lidar.
  • High-Quality Labels: We perform calibration and edge-case detection phases to ensure high-quality labels.
  • Insights: Get granular insight into the health of your pipeline and other metrics with our user-friendly interface.
  • Ease of Use: With just a credit card and a few simple steps, get high-quality training data fast.

Scale Rapid in Action

To demonstrate the remarkable capabilities of Scale Rapid, I would like to share an example task done by Adobe Research. The task involved transforming one image to another using photoshop terms or generic terms like saturating the image or making one corner blue. With Scale Rapid’s innovative 'freeform pipeline', we successfully tackled the completion of the task and grading process, which was otherwise a complex exercise.

Supporting Startups and Enterprises Alike

Scale Rapid has effectively supported startups, like Tempor, in labeling SIMS needing logs during the early days of COVID. We managed to cut down labor requirement by 50% and completed the task within three days.

Final Thoughts

In summary, Scale Rapid is just getting started. Our mission is to democratize access to high-quality data for startups, companies of all sizes, and to speed up your AI development and ML experimentation. So, say goodbye to waiting for your labels and welcome fast, quality data with Scale Rapid.


Video Transcription

My name is Z Chen Lee. People call me Z I am the GM of rapid scale uh which is our self serve labeling product that aims to provide high quality training data to companies of all sizes and democratize the access to quality data.So essentially rapid is an experimentation platform uh that provides customers with the quickest turnaround uh time uh to quality options. Given that skill was built with one mission in mind, uh accelerating the development of A I at every stage of the ML life cycle, we thought that it was very important to build a solution for every single ML engineer out there so that they can access quality data without um having to commit or uh spend a lot of money.

So uh while I was working at a scale, and before we realized that there wasn't a very good option in the market at the time. Uh Well, now we think scale rapid is that but at the time, researchers in early stage start ups uh didn't really have good options because they either had to commit a large amount of sum uh to actually because of the minimum commitment for companies uh or they had to go for options where they had to build a lot of their internal tooling to support it.

And it is still prevalent for enterprises and early stage companies alike such as even within enterprise companies. The ML experimentation for research teams where you have like spurts of like uh data that you needed to label is still absent in a lot of ways. So people are still waiting weeks or months for quality of that before they can start building out their models. Um And the barrier to ML for these teams is still uh very huge. Cool. So uh further on uh on the other challenges today, teams are often blocked on an inability to receive feedback and quickly iterate on their set of instructions, which is super, super important to getting high quality data and also determines in a lot of ways your feature set that you're training against also without clear instructions.

Uh sometimes labeler will label edge cases and arbitrary ways and this can lead to more problems uh downstream when you, when it comes to modeling. And you figure out that because you never covered a certain edge case, people are labeling arbitrarily and your model performs really poorly on those cases. And then in terms of scaling to production, it can be delayed or there's no seamless api interface that can help you do continuous labeling and fine tuning and production. So the problem of rapid was to remove this blocker and give you more inside and control over the labeling workflow. So the the core belief here is that by reducing time from weeks, months to hours and days, you reduce the amount of time, uh people need to wait for their labels and therefore accelerate the development of A I. And as a result that that's the fundamental belief and rapid as promised is the fastest way to production quality data. We have no minimums, no commitments just get started immediately. So what does rapid provide? So for one we provide flexibility with annotations and labor.

It supports various annotation formats from image to text to audio to video to 3d, 3d being lighter. Uh high quality labels are done in two phases. There is the calibration phase which involves calibrating and fining your instructions. There's edge case detection phase which involves production labeling and also flagging for edge case is not covered by instructions. We also provide granular insights on quality metrics. The health of your pipeline do features such as issues Q and other metrics. It's also super, super easy to use.

It's just six steps and a credit card you can get started and we're reducing that to four just for context. Uh the how fast, how fast is fast with skill? Rapid ML teams can get an initial set of high quality training in a matter of hours and we can turn around uh like 10-K plus images in a matter of like a few days, usually two days and there's no commitments required. So today, we're enabling companies of all sizes research teams and start ups. And in the last month alone, we've seen like a 66% increase in usage. And we've also added support for long videos which enables ML teams to break down longer form tests through stitching also support a variety of languages and robust lists of N LP use cases. So let me talk through like an interesting example of a type of task that was done uh that shows kind of like the innovations on rapid. So the task was uh uh was done by Adobe research. So they were, we were sent two images, image A image B. The task at hand was hey, can you issue a command that transforms image A uh into image B either in Photoshop terms or generic terms such as saturate the image or make the corner of the image slightly blue.

So in this case, you could say uh the right to issue is to gray scale out the damage or make the image black and white. That's one potential uh case. So for people who have a little bit of context on these type of tasks or have used, you know, other uh options before uh it's, it's really hard to grade against this. Like in the case where we have an objective answer you can grade against. That's that's very trivial because you just grade and you uh you give, assign you a score in this case. Um There's no easy way to grade the attempt ie the person who's submitting the command, right? So what we build out is we build this thing called a free form pipeline that kind of acts similar to Gins. Uh So formulation is the human attempter is the generator and the reviewer is a discriminator. So the attempter always submits like AAA attempt at writing a command and the reviewer looks at it, the sites either it's good enough and then we can send the customer or it's not. And it goes back to the same attempter after a few times, it goes to another attempter. The, the cool thing here is you can actually easily gauge how well the reviewer is performing by because you know which comments should be accepted and which comments should be rejected.

So effectively, you build out a quality mechanism on top of the reviewer layer. So uh we've also supported other smaller stage start ups such as temper, which was a start up focusing on labeling SIMS, seconding logs in the early days of COVID. As people can tell lumber uh was becoming very important and they were having a hard time uh labeling all of this uh by themselves. So we uh really expedited the whole process, finished labeling everything in three days and uh it like reduce the amount of like human labor required on their end by 50%. So in summary, uh very excited to talk to you about scale rapid, still very early on in the product live stage. The the mission is to market the access to quality of the first start ups companies of all sizes, we want to bootstrap, help boost strap uh and all efforts by by just like quickly providing production quality labels. And so stop waiting for your labels and speed up your uh experimentation. And that is all for me.