Hands-on NLP with Hugging Face

1 article/video left!

log in or sign up to unlock 3 more articles/videos this month and explore our expert resources.

Automatic Summary

Transform Your Knowledge in NLP with Hugging Face

Hello everyone, today I'm excited to impart to all of you a hands-on workshop on "Natural Language Processing (NLP) with Hugging Face". I am currently serving as a Machine Learning Research Engineer with a focus on NLP, AI robustness, and explainability. I am keen to inspire learners and NLP enthusiasts with the right resource.

Demystifying Natural Language Processing (NLP)

In the current era, developing an application that assesses the quality and trustworthiness of machine learning models is imperative. Furthermore, I feel a need for an increase in NLP resources in various underrepresented languages besides English. In light of this, I took the initiative to establish a community of Spanish-speaking NLP professionals known as "NLP en Espana" which translates to NLP in Spanish.

Today, I'll be walking you through the process of training a Spanish language model called S-Beta using Hugging Face libraries. It's crucial to remember that irrespective of the complexity of the model you're working with, the quality of the data plays a major role in the model's effectiveness.

Getting Started: Hacking Face Libraries

  • Datasets: Our chosen data set for this model is the Spanish Billion Words Corpus. It is an annotated Spanish corpus that contains almost 1.5 billion words. The best part? You can access this along with a plethora of other datasets with just two lines of code using the Hugging Face library.
  • Tokenizer: The next step is tokenizing the text, which involves breaking the text into words or sub-words and converting them into ids. Hugging Face's tokenizers library comes with multiple types of tokenizers, including 'Byte Pair Encoding (BPE)', 'Byte-Level BPE', 'WordPiece', and 'SentencePiece'. For our model, we'll be focusing on 'Byte-Level BPE'.
  • Transformers: The transformer architecture, with its ability for paralyzing and general performance improvements, is a popular choice among NLP and CV researchers. Its key innovations include positional encoding and multi-headed attention.

Training Process

The training process involves initializing a model from the configuration, training it via the trainer class, and saving it post-training. Once the model is trained, it can be uploaded onto Hugging Face's model hub for transfer learning or fine-tuning.

Online Resources and Books in NLP

Online platforms are a treasure trove of resources to enhance your knowledge in NLP. Websites like GitHub house substantial content related to NLP, including model datasets and other significant resources. Participating in active discussion groups and following NLP-related content on various platforms can do wonders in your NLP journey.


In conclusion, learning NLP and understanding its various nuances is a journey of its own. Aspiring NLP professionals can benefit from enrolling in relevant online courses, joining communities, working on real-life projects, and regularly interacting with experts in the field. Remember, there's nothing like learning by doing. So dive into the world of NLP and watch your knowledge and skills flourish!


Video Transcription

Read More