Data that puts the Chatbot in a Pickle by Tanaya Babtiwale
Exploring the Intricacies of Chatbots: From the Lens of a Machine Learning Engineer
Hello to all tech enthusiasts! Today, we have an exciting discussion coming straight from the experiences of Tana, a pioneering woman machine learning engineer at HTE. Here, she uncovers the complexities surrounding chatbots and the hurdles they face, particularly from a data perspective - and boy, do we have a lot to delve into!
Chatbots: From Traditional Rule-based Systems to Conversational AI
Chatbots have been making rounds in the industry for some time now. Initially, they were rule-based systems with clear boundaries on acceptable answers (here, rules point to predefined options like new policy, filing a claim, accident or theft, etc.). However, these chatbots weren't particularly popular among customers due to their lack of flexibility in conversations.
In response to the necessity for a more human-like interaction, modern-day chatbots or IvaS provide enhanced user experiences with their Conversational AI. These bots have the liberty to reflect informal, unstructured conversations filled with irrelevant or assumed information (since the dialogue partner is expected to already possess some knowledge). As a result, conversational AI is gradually finding its way into various domains, including e-commerce, customer support, lead generation, and even Covid-related services.
- Traditional Chatbots: Rule-based systems with stringent conversational boundaries.
- Conversational AI: Unstructured, human-like dialogue systems for enriched user experience.
The Role of Dialogue Systems in Chatbots
Dialog systems play a pivotal role in these human-focused tasks. Typically made to complete specific user tasks, these systems use sophisticated Natural Language Understanding techniques. They aim to discern the user's intent from their unstructured message, retain the conversational context, ask follow-up questions, and steer the user to complete the desired task.
Research Differences: Academia vs Industry
The research surrounding these dialogue systems varies significantly in the realms of academia and industry.
Academic research generally aims to create an impact through novelty, striving to solve original problems with fresh solutions. It's a constant race to outperform the state-of-the-art benchmarks. On the other hand, industry research focuses on issues faced by live, deployed systems. Influenced heavily by business decisions, industry research emphasizes fast throughput, optimized resources, and code maintainability.
- Academic Research: Focuses on novelty and beating state-of-the-art metrics.
- Industry Research: Targets issues in live systems, business decisions, and faster throughput.
Data Challenges in Dialogue Systems
Data forms the backbone of any machine learning model, but it presents its own set of complications. They can be generally categorized into three areas:
- Lack of Data: Data acquisition and pre-processing can be a long and meticulous process. Real-life user conversations, historical data, and academic datasets need to be annotated and validated before being used. If data falls short, augmentation techniques are used to synthesize it.
- Need for Normalization: Elements like spellings, acronyms, colloquialisms, and code-switching need to be standardized for the model to understand.
- Context Handling: The system requires understanding previous dialogues to infer the context of a conversation, managing ambiguities, and identifying sub-dialogues (short conversations within a larger discussion).
The multifaceted world of data challenges presents countless hurdles for dialogue systems. However, understanding these hurdles can help us appreciate the intricacies these chatbots battle, benefiting users and developers alike.
Feel free to reach out for further enquiries or clarifications. Happy learning!
Video Transcription
Uh Hello everyone. Uh I hope uh my slides are visible. So, uh yeah. Firstly, uh thank you women. Thanks for having me. It is always an absolute pleasure to see the energy and the response and the inspiration throughout the event.Uh I am Tana and I'm a machine learning engineer. Uh the first machine uh woman machine learning engineer at HTE uh globally leading conversational board platform. Uh A big thanks to who of us joined for wanting to know more about chatbots and some of the fundamental data specific problems that they face in the field. My work has been surrounding the domain of N LP and dialogue system since my college days. So five years now across industry and academia are both. So I'm excited to share with you my insights and learnings from that short today. So uh starting off uh chatbot has been a buzzword in the industry for a while now. But what is it really? So uh traditionally, a chatbot is a rule based bounded system that users can chat with for answers uh with the accepted answers having very clear boundaries we can see in the image here if I don't know if you can see. Uh but uh it has stringent boundaries on whether you can select new policy or filing a claim or stolen versus accident. It uh pays the path for the conversations uh constraining users to a presigned flow.
So while these chat bots are very popular in the market, they're not as popular among consumers who have to use them. Uh because of such strict boundaries on conversation. Uh And what users want is a more interactive solution that can talk like you and I to feel like they have a personalized response from a human representative. That's where the present day research on chatbot comes in and conversational A I with conversational A I modern day chatbots or I va S enhance user experience and provide for conversations that uh we are more accustomed to in our everyday speech. So our everyday speech is informal, unstructured, I can say whatever I want. I can give a lot of information like we see in the sample image that is not relevant. And uh it may either be content rich which is full of information or context rich, which might not have certain information because the speaker assumes that you already have that. So uh because of more um familiarity and flexibility, these boards are reaching multiple domains.
So you have conversational ecommerce, customer support lead generation. Uh The name of you. Another cool use case is the chat board for COVID that HAP to release with uh the government of India to help people with vaccination related tasks and create an awareness like booking appointments, downloading certificates, et cetera. So uh these human focussed tasks are backed up by dialogue systems.
So now we're coming to the technical part, which are these dialogue systems, they're often task oriented. That means they want to get some task done by the user and they make use of natural language understanding techniques to try and understand the intent behind what the user is saying from a user's unstructured message, they retain conversational context, they ask follow up questions and they prompt the user in the correct direction for uh the task to be completed and uh so on.
And you know, as you can see on the screen, there's like ceramic analysis and translation and way too much stuff going on. So now on to more uh very technical stuff uh the research surrounding these dialogue systems in academia versus uh in industry. So uh looking back to my university days uh and comparing them with my intern internship tenure at the Haptic uh uh and IB research labs, uh my perception of academia has been that of creating impact out of novelty. So some new ideas, academia is geared towards exploring interdisciplinary fields from historical manuscripts to the latest publications and uh taking out fresh original uh solutions to, you know, fresh original problems. So um this novelty that we're talking about uh also brings out a sort of race to beat the state of the art metrics, you know, somebody else has 87%. So I will achieve 87.2% accuracy on this task is what the race is about. So you might see a lot of discussion surrounding large deep learning models with insane number of parameters, you know, huge amount of data, huge computer systems. Everything furthermore, uh academia can give you the leeway to work on task specific research. So um there is a lot of optimization for singular tasks. So you know slot filling or uh say intent detection, ntd detection, uh these are specific tasks that a dialogue system has to go through. Uh So the research can focus on a single one.
Uh So there is a lot of optimization for singular tasks and each task may have a separate state of the art model to compute the effectivity of the state of the art models on these tasks. Academia has preset, benchmarking data sets like multi woo uh which is for end to end dialogue policy and DS TC for dialogue state tracking tasks. So uh while benchmarking makes sense, these data sets are created in a very idealistic setting and thus do not have the linguistic variety or the perturbations and data present in the outside world. Many times these data sets are created for a small number of domains. So you might have restaurant booking system or insurance use case. So these prevent domain transfer across different domain and keeps the scope of the study very limited. On the other hand, speaking of industry and my time at HAP tech uh industry research is focused upon the issues faced by live working systems which are deployed which you as users can access. And this is heavily influenced by business decisions. So you know what is the competitor company doing or you know what can get us the most amount of sa sales. For example, industry research leans towards a faster throughput because they constantly need to put things out to get the edge on the competitors.
Uh So as compared to academia, definitely faster and thus it is unable to reach the kind of depth of the topic that academia is able to reach. But at the same time, industry research has to focus on a lot more things uh like optimizing on the resources like data storage or GP U compute for scalability purposes. Uh It focuses on robustness, repeatability of models and generalization to aim to generalize over as many domains as possible. So they can have as many clients and as many use cases as possible. Industry. Therefore, is a lot more motivated towards research of generalization.
So multitask learning and stuff like that. So uh because this research also needs to be production maintainability becomes indispensable code, has to be modular experiments have to be documented such that you know, tomorrow some other person can come in and we use the same set of experiments and perform the same set of things on the same environment uh uh of tasks.
So that is something industry focuses a lot on. But the most crucial difference between the two is data and the problems that arise with it. OK. So uh fasten your seatbelts and let's get into quick ride to a lot of information now. Uh because uh problems with data come three fold, uh let's look at them one by one. So lack of data. When I say this data uh by this data, I typically mean the data that is required for training of these machine learning or deep learning models, which are, you know, huge technical no models present at the back end. These models require a lot of data and you know, um it and that is it's usually not readily available. This is the kind of data that these models need getting. This data is often a long process. Uh It needs to be acquired uh maybe historical data that you have gathered. Uh it can be merged with data sets that academic researchers work on. But most importantly, real life user conversations, it then goes to a series of preprocessing techniques depending on the task at hand.
Uh It is sufficiently annotated as per your company specific needs. And then it has to go through a series of validation tests to reaffirm if this is indeed the kind of data that we will come across when everything goes to production more often than not this data we collect from outside is not enough. So we may use multiple data augmentation techniques to synthesize this data. So you are a few popular methodologies that are followed for data augmentation. Uh So you know, you consider a sentence, I don't want to b the samples given you provide an imputed insight into how extra sentences are created to effectively enhance the quantity of our data. So you know, just picking one of them, say back translation, back translation is when you translate the sentence to another language and then you translate it back to the previous language. This does cause certain changes in the sentence. You know, you've played Chinese whispers, you know how it goes. So uh that is one method to do it among all of these others, there's negative augmentations also to prevent overfitting. But that is way too technical.
Uh So uh next we come to the need for normalization as opposed to training data. Uh when things get to production and real life users are encountered, there may be certain parts in the messages by these users that have to be normalized uh for or rather put in a way that the model understands. So, you know, let's gloss over what they are and how they can be handled one by one. So uh colloquialisms, if I say she has 10,000 followers, you and I know we must be talking about followers on social media handles Instagram Twitter, even Linkedin. But there is no way for the dialogue system to know that since these are pretty new concepts that keep on changing according to trends, the system taking things literally might also think that people as many as the population of Nauru are following her around to solve this. A common directory of popular colloquialisms must always be maintained and kept updated. A more technologically forward solution is to understand the semantics in the sentences to mark these lands off for what they really are. Acronyms are. On a similar note, you may have acronyms that uh need to be resolved. You very well could have a file storing just the mappings of these acronyms to their complete forms.
But uh what happens when you uh say something like Imo looking at formal documents, they may resolve to International Maritime Organization in a casual conversation. I'm texting my friend. It can mean in my opinion, depending on the context, it may even be International Math Olympia, you literally never know. So taking the context of the conversation, the conversation you've had previously into consideration is just as important when they resolving these acronyms, spelling mistakes, who doesn't make mistakes, right? You might have accidentally clicked the wrong key while typing a word or you may simply not know the spelling.
Sometimes incorrect spellings also become a norm of typing something out in the so called S MS languages. So uh you know, we can definitely not store each and every one of these mistakes as a mapping because where do you draw the line? You know, when you're talking about sc if you have 100 words, uh if you just have 100 3 letter words, the number of changes you might have to store would be way too high and you have way more words than just that. So in this case, certain libraries can help Haptic has open source, its fellow library just for instances like this. And it is just one of the many libraries that can help you get there. But I mean, this is a problem that it does need to be addressed then coming to my deity, you know, we talk about inclusion and diversity. So while creating dialogue systems, we also have to realize that not all users are accustomed to English, you know, we need to include the ones that are not, there's many different scripts and many languages which can be mixed together in conversation, my mother to Marathi or Hindi.
I might, you know, mix up with English when I'm speaking, there are instances of Spanglish and HL English. Uh So these could be mixed together in conversations. This phenomenon is known as code switching CS is typically present on the inter sentential and inter sentential uh and even morphological levels. So uh code switching presents serious challenges for language technologies like passing machine translation, automatic speech recognition information extraction et cetera uh dialogue system that happen.
And a lot more advanced dialogue systems support upwards of 30 languages straight from European languages to Indian languages, to Bahasa, Thai Mandarin. What have you finally about the context that we keep talking about? Uh So here we can see an example uh where I am talking um about, you know, tell me uh about Tana, it's me. Uh So you know, what do you wanna know about? Than where is she giving her talk today? Who is she in this message? I go back to my previous message to check. OK. She is Tana. So then where is she giving her talk? OK. Uh In women in tech conference uh and works at works at what? Who? So I got two messages behind and see. OK, than I than I works that happened. So that's how context works. So in task cred dialogue system, there's also something called probing uh which asks, which might ask you to depending on the context, select the particular tasks, ambiguity. So now um if I say something like I want to terminate my current. So, you know, uh I was wondering about the termination cost. So with this sentence, there are so many questions that arise because there is a lot of ambiguity tests, which plan is the user on? Is the user looking to switch plans or completely stop my service?
Is the user looking to sign up or does he have an existing plan, then furthermore will, you know, going away uh level further uh is the user looking for maybe a discount uh because they want to switch or should I be offering that? So that is where ambiguity comes in, which is a very important uh thing to handle and very difficult thing to handle because even as humans, when we're having conversation, right, uh these conversations are so ambiguous that uh we can't really uh focus on them. So that is the end of uh this, there's certain amount of overlap uh with the need for normalization when it comes to handling linguistic phenomena in human speech. So, uh you know, we can see some characteristics of human language out of which co reference or ellipsis or uh some uh or negations we already saw in the context. But uh you know, when it comes to things like sub dialogue, so have you ever uh you know, talked to a friend and about one topic, say, telling them about what your hairdresser told you the other day and your friend suddenly asks, oh, by the way, what law do you do? I need to make a change, you respond and then you continue with your story, these tiny snippets of context which are known as sub dialogues and are so frequent in human conversations, but very difficult for machines to handle.
So these are broadly all the challenges pertaining to data that the dialogue system has to defeat. So the next time you're talking to that chatbot and hating on it for not working. I've done my share, fair share of that. So, uh you can probably point out why exactly the intelligence system failed. So that was my talk. Uh That was my time as well. So if you have any more questions or any clarification about anything, feel free to reach out. Thank you for listening.