Using transformer models for your own NLP task - building an NLP model End To End by Ana-Maria Istrate

1 article/video left!

log in or sign up to unlock 3 more articles/videos this month and explore our expert resources.

Automatic Summary

Introduction to Transformer Models for Natural Language Processing and Building an NLP Model

Hello everyone, my name is Anna Maria Rate, a senior research scientist at Chan Zuckerberg Initiative, and I am thrilled to dive into the exciting world of transformer models for natural language processing (NLP) and demonstrate the process of building an end-to-end NLP model. My interests revolve around NLP and its applications in the scientific domain, specifically in text mining biomedical literature, predicting the effect of research outputs, knowledge graphs, and graph-based models.

Why are Transformer Models Crucial in NLP Tasks?

Transformer models have gained popularity mainly in natural language processing and computer vision. These are deep learning models grounded in the concept of self-attention, which enhances our understanding of a current word's representation given its context in an input sequence.

The relevance of transformer models dates back to 2017, when they were first introduced, and quickly became state-of-the-art in NLP tasks. Transformer models stand out from Recurrent Neural Networks (RNNs), the preceding state-of-the-art NLP models, as they accommodate more parallelization and reduce training times.

Popular models like BERT, GPT, D-5, Roberta, Electra, Albert Ernie, among others, have been developed based on the transformer architecture, demonstrating high-performance on a variety of tasks such as named entity recognition, question answering, summarization, and machine learning translation. A crucial element behind their success is the application of transfer learning - fine-tuning pre-existing models to suit the task at hand.

Constructing a Transformer-Based NLP Model, End-To-End

In the creation of our transformer-based NLP model at Chan Zuckerberg Initiative, our objective was to isolate mentions of data sets and experimental methods appearing in biomedical research articles using a technique called Named Entity Recognition (NER). Our demonstration in building this model revolves around three main stages;

  • Creating the training data set
  • Initiating the model development
  • Performing model evaluation.

1. Creating the Training Data Set

Our series of steps in constructing our training dataset began with the question of what to include. We outlined clear definitions of terminology, particularly datasets and experimental methods, that would proceed to shape our dataset. This understanding was crucial, especially in specialized fields or tasks where boundaries are not clearly defined.

When creating the dataset, two primary options were available to us - using an openly available dataset, or creating one from scratch. Our task being specialized, we opted for the latter, which though time-consuming and potentially expensive, suited our needs perfectly with assistance from our bio curation team.

2. Initiating the Model Development

At this stage, we revisited the discourse on transformer models and the role of transfer learning which involves fine-tuning the weights of a pre-trained model and using them as initialization for a new model. For specialized domains like the biomedical field, we spotlighted the importance of using models that are pre-trained to suit the specific domain.

3. Performing Model Evaluation - Quantitative versus Qualitative Evaluation

The evaluation stage for our NLP model saw us delve into both quantitative measures including precision recall, F1 scores, and accuracy, plus human evaluation, a crucial ‘sanity check’. Collaborating with our biomedical curators provided vital feedback that significantly improved our model’s performance.

Conclusion

Our dissection of the transformer models, plus the procedures of constructing a transformer-based NLP model from the ground up, enabled us to highlight the critical role of understanding and addressing the specific needs of the task at hand. This involved a detailed definition of tasks, choosing the appropriate data set, understanding model development, and incorporating comprehensive evaluation methods.

To learn more about our work at the Chan Zuckerberg Initiative, feel free to check our initiatives and have an enlightening exploration. Thank you for your time!


Video Transcription

Read More