Name Entity Recognition (NER) - Methods and Pre-Trained Models Review

Name Entity Recognition (NER) - Methods and Pre-Trained Models Review

Name Entity Recognition

NER is extraction of named entities and their classification into predefined categories such as location, organization, name of a person, etc. The named entity is any real words object denoted with a proper name. This helps to recognize entities in the document, which are more informative and explains the context.

Following are the pre-trained models used for NER:

  • NLTK
    • Algorithm: The text is tokenized > the tokens are passed through a Part Of Speech (POS) tagger > a parser chunks the tokens based on their POS tags to find named entities.
NLTK NER Model Data Data Source Model Description Entities
NLTK(nltk.ne_chunk()) ACE 2004 newswire, broadcast news, telephone conversations MaxEnt classifier Organization, Person, Location, Date, Time, Money, Percent, Facility, GPE


  • Stanford NER
    • Stanford NER is also known as CRFClassifier.
    • Algorithm: A CRF is a conditional sequence model which represents the probability of a hidden state sequence given some observations.
    • This is especially useful in modeling time-series data where the temporal dependency can manifest itself in various different forms.
Stanford NER Model Data Data Source Model Description Entities
3 class CoNLL 2003 eng, MUC 6, MUC 7, ACE 2002 newswire, broadcast news, telephone conversations CRFClassifier Location, Person, Organization
4 class CoNLL 2003 eng(1,393 English news articles) news articles CRFClassifier Location, Person, Organization, Misc
7 class MUC 6 and MUC 7(~318 news articles in MUC 6) newswire CRFClassifier Location, Person, Organization, Money, Percent, Date, Time


  • spaCy
    • spaCy is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython.
    • Algorithm: Convolutional layers with residual connections, layer normalization and maxout non-linearity. And a novel bloom embedding strategy with subword features is used to support huge vocabularies in tiny tables.
spaCy NER Model Data Data Source Model Description Entities
en_core_web_sm OntoNotes(~1745k articles) telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs English multi-task CNN. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal
en_core_web_md OntoNotes(~1745k articles)(Vectors - 685k keys, 20k unique vectors (300 dimensions) telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities. Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal
en_core_web_lg OntoNotes(~1745k articles)(Vectors - 685k keys, 685k unique vectors (300 dimensions) telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities. Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal


  • GATE (General Architecture for Text Engineering)
    • ANNIE (A Nearly-New Information Extraction System) is rules-based system that work on different layers of abstraction along the NLP pipeline
    • Algorithm: ANNIE comprise of a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.
    • ANNIE can be used as-is to provide basic information extraction functionality, or provide a starting point for more specific tasks.
GATE NER Model Data Data Source Model Description Entities
ANNIE - - Rule-Based(Finite State Machine) People,Location,Organization


  • Flair
    • Flair is an openly available framework for a range of NLP tasks across different languages.
    • Algorithm: A sentence is input as a character sequence into a pre-trained bidirectional character language model. From this LM, we retrieve for each word a contextual embedding by extracting the first and last character cell states.
    • This word embedding is then passed into a vanilla BiLSTM-CRF sequence labeler.
Flair NER Model Data Data Source Model Description Entities
ner CoNLL 2003 eng(1,393 English news articles) news articles Contextual String Embeddings + BiLSTM-CRF Location, Person, Organization, Misc
ner-ontonotes OntoNotes(~1745k articles telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs Contextual String Embeddings + BiLSTM-CRF Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal


  • Deep Pavlov
    • There are two main types of models available: standard RNN based and BERT based.
Deep Pavlov NER Model Data Data Source Model Description Entities
ner_ontonotes OntoNotes(~1745k articles telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs Bi-LSTM+CRF Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal
ner_ontonotes_bert OntoNotes(~1745k articles telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs BERT+Bi-LSTM+CRF Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal
ner_conll2003 CoNLL 2003 eng(1,393 English news articles) news articles Bi-LSTM+CRF Location, Person, Organization, Misc
ner_conll2003_bert CoNLL 2003 eng(1,393 English news articles) news articles BERT+Bi-LSTM+CRF Location, Person, Organization, Misc


  • AllenNLP
    • fine grained ner: BiLSTM-CRF+ELMo
AllenNLP NER Model Data Data Source Model Description Entities
fine grained ner OntoNotes(~1745k articles telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs BiLSTM-CRF+ELMo Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal


  • Polyglot NER
    • It uses huge unlabelled datasets (like Wikipedia) with automatically inferred entity labels (via features such as hyperlinks).
    • The internal links embedded in Wikipedia articles are used to detect named entity mentions. When a link points to an article identified by Freebase as an entity article,the anchor text is taken as a positive training example.
Polyglot NER Model Data(vocabulary) Data Source Model Description Entities
Polyglot NER mostfrequent 100K words and the word Wikipedia Articles and Freebase Classifier (feedforward neural network) Person, Locations, Organizations


Overview: NER Model Performance

NER Model Data Data Source Model Description Entities Performance F1 score(Dataset)
NLTK ACE 2004 newswire, broadcast news, telephone conversations MaxEnt classifier Organization, Person, Location, Date, Time, Money, Percent, Facility, GPE 0.89 ± 0.11(CoNLL-2003)
Stanford NER Model CoNLL 2003 eng(1,393 English news articles) news articles CRFClassifier Location, Person, Organization, Misc 87.94%(CoNLL-2003)
Polyglot NER mostfrequent 100K words and the word Wikipedia Articles and Freebase Classifier (feedforward neural network) Person, Locations, Organizations 71.3%(CoNLL-2003)
spaCyen_core_web_lg OntoNotes(~1745k articles)(Vectors - 685k keys, 685k unique vectors (300 dimensions) telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities. Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal 85.85%(OntoNotes 5)
Flairner-fast CoNLL 2003 eng(1,393 English news articles) news articles Contextual String Embeddings + BiLSTM-CRF Location, Person, Organization, Misc 93.09±0.12%(CoNLL-2003)
Flairner-ontonotes-fast OntoNotes(~1745k articles telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs Contextual String Embeddings + BiLSTM-CRF Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal 89.7%(OntoNotes 5)
Deep PavlovNer_ontonotes OntoNotes(~1745k articles telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs Bi-LSTM+CRF Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal 86.4%(OntoNotes 5)
Deep PavlovNer_ontonotes_bert OntoNotes(~1745k articles telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs BERT+Bi-LSTM+CRF Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal 88.6%(OntoNotes 5)
Deep Pavlovner_conll2003 CoNLL 2003 eng(1,393 English news articles) news articles Bi-LSTM+CRF Location, Person, Organization, Misc 89.9%(CoNLL-2003)
Deep PavlovNer_conll2003_bert CoNLL 2003 eng(1,393 English news articles) news articles BERT+Bi-LSTM+CRF Location, Person, Organization, Misc 91.7%(CoNLL-2003)
AllenNLP NER Model OntoNotes(~1745k articles telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs BiLSTM-CRF+ELMo Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal 88.7%(OntoNotes 5)


References

Dataset