Unlock Your Trapped Data Using AI and Machine Learning by Winn Oo
Unlocking the Power of AI and Machine Learning in Application and Data Processing
Today, I would like to share an intriguing story to illustrate our approach in leveraging cutting-edge technologies, like Artificial Intelligence (AI) and Machine Learning (ML), in creating bespoke application processes and data management systems. This journey encompasses a broad spectrum of experiences, from enterprise application development to data transformation and migration processes.
Our Experiences and Objectives
Collective experience spanning over 20 years has allowed us to carve a niche in product development and focus on aspects like data injections and transformations. Combining this with the bleeding-edge technology of ML and AI, we aim to develop effective and rewarding solutions.
How the Tech World Responded to the Covid-19 Challenge
The devastation of COVID-19 presented us with a unique challenge. We developed a ground-up platform using AI and ML for sifting through a myriad of data submitted for disaster relief funding. This platform assists in compliance checks, reviews application statuses and expedites the release of funds to impacted households and individuals, emphasising our commitment to harness technology for the betterment of society.
Incorporating Technology in Disaster Relief
Dealing with Document Challenges
One of the challenges we negotiated during this journey was dealing with a diverse range of document types submitted by users. Some were easy-to-read PDF formats or electronic files, while others were varied forms of invoices, payment receipts, and even photos of physical documents taken on phones. The crucial task herein was to create a robust platform that could handle and analyze the entire spectrum of document types.
Our Solution: An Intelligent Document Processing Platform
Addressing this issue led to the creation of a simple but comprehensive platform. We used custom machine learning models to process these diverse documents. Our platform could ingest any document, run it through our OCR engine for reading and use our AI model to classify it appropriately. We extracted necessary data points from each document, standardised the information and then applied requisite rules and checks post transformation, rendering the data worthy for further action.
Applied AI and ML in Document Processing
Employing AI and ML in document processing allowed us to deliver rapid service, maintain transparency in our processes, increase the accuracy of compliance reviews, and significantly reduce manual data reviews.
- Our system flagged potential fraud cases. Each flagged case was subjected to secondary review, with a successful detection rate of over 80%, resulting in savings of about $51 million.
- We were also able to detect duplicate applications, saving reviewers precious time and resources.
Conclusion
By using automation with AI and ML, we were able to effectively process an extensive variety of documents, emphasizing the innate potential of these technologies in problem-solving and process efficiency. As we continue our journey in the tech industry, these experiences inspire us to stay ahead and adapt to the ever-evolving landscape of technology.
Video Transcription
Thank you everyone for joining. I know it's uh later in the day for some of you and it's super early in the day for some of you depends on what time zone you're in.So um my name is WW and today I'm here to share a story about how uh we've developed the applications and processes and utilize the A I and A nodes to enter the data. So um before we dive into a little bit, um you know, I want to give you some background of me as well as who co Resnick is so that you can relate a little better to the story that I will be sharing with you. So my background, most of my background um has been in the application development and then system integrations and implementation. Um Currently, I'm here at Ces Nic, I'm a director here in the technology plus team and I currently uh lead the product development team focusing on the data injections and transformations as well as the emergent technology. So I've been in the industry for a little over 20 years now, about 22 3, something like that.
And uh with the journey that I've been and how I see myself in this dance, you know, this intersection of my career is someone uh who can marry these uh you know, two paths. If you may uh have my experience with the enterprise application development and all my knowledge with the data and and the data transformation and data migration processes and applying all uh the curing edge technology of ML and A I and create a effective useful reward solutions.
And at home, sic uh we are a uh advisory and insurance and tax firm and uh I I in the advisory service. Uh So what does that mean? That, that means that we provide and offer advisory services? Uh starting with the strategy system selections to technology system implementation.
Um uh And uh to the all the way down to the managed service and you know, the firm also have assurance and tax uh arm in there. What we do is we provide, you know, the audit compliance review and all sorts of tax services uh both at the, you know, uh local as well as a national level. So, uh and uh we are uh across the US uh with uh majority, you know, representation in the east coast and south and the west coast and then the south. And I, he sits here in Colorado. Uh Basically, it's really easy to uh reach. So now that, you know, a little bit more about me and then a little bit more about what the CIC does. Uh the story that I want to walk through with you on how we apply the technology to the use cases has to do with the uh you know, the COVID uh impacted uh relief. So uh basically, uh what we done is we've developed a platform uh from the ground up using the A I and ML it extracts all sorts of data that's been submitted to us uh as a disaster relief funding and that uh platform help us review uh you know, our compliance check uh and then review the application statuses and basically a, you know, release the funds to the impacted uh household and people.
Um as we all been here, I mean, COVID is still here. Uh it's been in fact impacting us uh for over two years now. And uh you know, as we go through the COVID journey, right, we learned from it that the COVID impacted us. Uh our population disproportionately not only from the medical standpoint, but also from the economic standpoint. And uh Congress has the approved uh multiple measure to help assist with that. And emergency rental assistance program is one of the uh you know, example of that uh package and there's a small business program uh that they uh released. Uh but in particular in emergency uh rental assistance program, right? There are two waves. Uh One was in December of 2020 that's the $25 billion in funding and then the second one was in uh March 2021 which was a $21 billion in funding. Uh The idea is to uh to help individuals and family, right that impacted uh by this COVID uh 19 pandemic to be uh who are behind on the rent, who are behind on the essential bills to pay uh to and who are on the brink of being evicted from their housing. So basically, the goal of the program is to distribute the grant money to the people who need it the most. And uh the mandate of the program is to uh ask for the input from the applicants.
Uh the information to show that they are in need, which means they ask to submit, you know, their pay stub, uh the W-2 the extra return if they have the least document um the all the landlord information, um the rental payment receipts uh if they had, if they're behind on the util, right?
Thanks to uh submit. Uh what are, how far behind they are and what the past due, etcetera and the cover of the utilities is both for the electric water, sewer, trash, right? So not only one type of utility, these essential utilities are the type of, you know, area that the program covers. And this program is pushed to all states, right, to all population at once and very fast. So the states and localities uh basically uh you know, uh created uh this, this type of uh portal to uh with their vendor, they're choosing to collect all of these informations and you have to provide the ID, you have to provide all the uh essential information that's needed as an application uh submission and request for this funding.
So which is all fine and good. But uh keep in mind that the roll out is to millions of people uh to vast majority and where com comes in is one that after the information is collected, then we need to review the information to make sure that they check the box, they are not fraud. Uh and et cetera. So when you think of all the information uh that's going to be submitted and the populations who is going to be submitted on this information is going to varies widely. So that's where the trouble if you may comes in that, how are we going to if you analyze and collect the information that we need from these type of data so that we can uh you know, uh succinctly review and then uh you know, expediently to distribute the funds. So when you look at this information, let me show you the spectrum of the document type that you're dealing with and what type of data that we would fall into. So this is a chart from Gardner's uh you know, categorizing which one from structure the most structured document to the most unstructured document. So when you think about it on the information that we collect.
All right, like such as W-2 1040 the text transcripts, which are very standardized formats, it's really easy to read, uh mostly available in the PDF format or uh if somebody is filing electronically. But um keep in mind the population we serve is not technically uh savvy or not the means to have the electronic form cleanly available for, for submission, right? Uh And however, uh that's uh one part and then the second part is also a variety of the document that we're getting. So, uh if you look at the utility document depends on the state, right? Um Whether or not it is federally regulated or not, you're dealing with multiple type of utility vendors and, and you will receive multiple varying form of the invoices and, and, and it's, you know, late notice uh form that you will receive that which you will need to read. And if the documents are available uh in the PDF format, of course, it's easy to submit. But as you know, the, uh the, the population that who will be interacting and submitting with majority of the time will be submitting as a uh you know, picture taken from the phone, um scanned document.
It will be in low light, it can be, uh you know, uh skew scan rather than the electronic PDF copy that will, you will receive in the mail. So we need to create a platform that's going to handle the entire spectrum of the data type, data document type and then analyzed on it. And so the idea is that we uh sorry the wrong direction. The idea that we have gone through uh and came up with it was uh to create a simple, simple platform, regardless of the type of the document we can ingest and and uh absorb it. And then we can transform, we can validate, apply the validation and then we can share it. And because as soon as the data is structured, right, we can deploy all sorts of models to detect whether it's be fraud, whether it's compliance according to the rules and whatnot. And this applies not only to COVID but to everywhere uh right. Uh What I mean by that is if you look at any organization or any corporation uh the organ, the majority of the organization's data sits in the end structure format. And uh in garner's term, this will call, you know, basically we are creating a intelligent document processing um platform. So that's easy from and visualize it. But how are we really going to do?
It is uh uh you know, uh it, it's, it's the, it's the cracks of it. So the best way that I find and sometimes the best way to solve the problem is to walk backwards. Uh So the main idea is to have a structure data, be able to share it research on it report it, analyze it, right, ran the models and uh figure out uh all the statistics and coalitions uh to it. So to achieve that, uh we definitely need to figure out how we're going to transform it. What information that we're going to collect it. Uh The and the information that we're going to collect is going to be varies from document type to document type. So in order so to do that, we need to figure out, you know, first what type of document is being submitted. So the way that we design our platform is to accept all sorts of documents, doesn't matter what type of documents. Uh whether it's the PDF uh as in electronics, scanned, copy or picture, whatever it may be just uh has an instrument, we will accept it and we created our own custom machine learning model to read code and code. Basically, we have a OCR engine to read these submitted documents. And then our model would decide what this is a W-2. This is the pay stub, this is the lease. Uh you know, this is a utility bills and so on.
And if we cannot, cannot read it, then that's OK. Uh Then we have to ask because we cannot guess uh uh you know, uh because these are the important documents and the extra people be being impacted. And after the documents being probably classified, uh then we know what data points we're going to be reading from each document and if we can uh extract the data and then read it uh right, uh then we can apply it uh rules to it. So what I mean by that is um in order to be qualified for the program, right, uh the you the lease, for example, needs to be fed uh between the COVID terms and like what it means is by March 2020 to July 2021 for example, and the electricity has to be, you know, it has, the bill has to be either electric utilities or trash, et cetera.
And we also need to figure out how much late fees that they have incurred and all the late fees uh combined with the current bill fall within the parameters of the programs that's identified. So we do need to apply all these check and balances, make sure the address is valid. The address that they submitted uh matches to the address on the lease and so on and so forth. So after we are able to extract the data points and a transform it to everything to a standardized format that we will be executing, then we are able to start applying those type of rules and when we have rules and apply claims formatted, then that data is golden right? That it is a good data that we can act on. So with this type of process applying both uh you know, the A O as well as A I and various later uh science models to our cleans data. We're basically able to achieve and deliver the key ask, which is basically the speedy deliver of our programs, uh delivery of funds to the impact of people, improve the transparency that we know where we are in the process of this review. We are able to uh you know, increase the accuracy for our compliance review and then reduce the a lot of manual data review. Um Our data science team was able to apply their algorithms and models to detect fraud.
Uh and every fraud case that we have flagged gone through the secondary review. And uh we find that more than 80% success rate on us that every time that we the flag, more than 80% of the cases are truly fraud. And as a result, we were able to save like $51 million. And not only that, uh because this system is uh you know, deployed to the E A program, the submission program is submitted by population, right, the mass population and I pretty much across the United States and whether it be people to get anxious people, uh you know, thought they haven't submitted.
So there's a lot of duplicate as duplication as well and we were able to detect the duplicate uh application so that it saves time for the reviewers to not to keep reviewing the duplicates. So with that, uh you know, uh basically, uh we can see the drastic uh impact of how the automations and a INML could help uh apply on all of these documents to extract the data uh effectively.