Name Entity Recognition (NER) - Methods and Pre-Trained Models Review
Name Entity Recognition
NER is extraction of named entities and their classification into predefined categories such as location, organization, name of a person, etc. The named entity is any real words object denoted with a proper name. This helps to recognize entities in the document, which are more informative and explains the context.
Following are the pre-trained models used for NER:
- NLTK
- Algorithm: The text is tokenized > the tokens are passed through a Part Of Speech (POS) tagger > a parser chunks the tokens based on their POS tags to find named entities.
NLTK NER Model | Data | Data Source | Model Description | Entities |
---|---|---|---|---|
NLTK(nltk.ne_chunk()) | ACE 2004 | newswire, broadcast news, telephone conversations | MaxEnt classifier | Organization, Person, Location, Date, Time, Money, Percent, Facility, GPE |
- Stanford NER
- Stanford NER is also known as CRFClassifier.
- Algorithm: A CRF is a conditional sequence model which represents the probability of a hidden state sequence given some observations.
- This is especially useful in modeling time-series data where the temporal dependency can manifest itself in various different forms.
Stanford NER Model | Data | Data Source | Model Description | Entities |
---|---|---|---|---|
3 class | CoNLL 2003 eng, MUC 6, MUC 7, ACE 2002 | newswire, broadcast news, telephone conversations | CRFClassifier | Location, Person, Organization |
4 class | CoNLL 2003 eng(1,393 English news articles) | news articles | CRFClassifier | Location, Person, Organization, Misc |
7 class | MUC 6 and MUC 7(~318 news articles in MUC 6) | newswire | CRFClassifier | Location, Person, Organization, Money, Percent, Date, Time |
- spaCy
- spaCy is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython.
- Algorithm: Convolutional layers with residual connections, layer normalization and maxout non-linearity. And a novel bloom embedding strategy with subword features is used to support huge vocabularies in tiny tables.
spaCy NER Model | Data | Data Source | Model Description | Entities |
---|---|---|---|---|
en_core_web_sm | OntoNotes(~1745k articles) | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | English multi-task CNN. Assigns context-specific token vectors, POS tags, dependency parse and named entities. | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal |
en_core_web_md | OntoNotes(~1745k articles)(Vectors - 685k keys, 20k unique vectors (300 dimensions) | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities. | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal |
en_core_web_lg | OntoNotes(~1745k articles)(Vectors - 685k keys, 685k unique vectors (300 dimensions) | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities. | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal |
- GATE (General Architecture for Text Engineering)
- ANNIE (A Nearly-New Information Extraction System) is rules-based system that work on different layers of abstraction along the NLP pipeline
- Algorithm: ANNIE comprise of a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.
- ANNIE can be used as-is to provide basic information extraction functionality, or provide a starting point for more specific tasks.
GATE NER Model | Data | Data Source | Model Description | Entities |
---|---|---|---|---|
ANNIE | - | - | Rule-Based(Finite State Machine) | People,Location,Organization |
- Flair
- Flair is an openly available framework for a range of NLP tasks across different languages.
- Algorithm: A sentence is input as a character sequence into a pre-trained bidirectional character language model. From this LM, we retrieve for each word a contextual embedding by extracting the first and last character cell states.
- This word embedding is then passed into a vanilla BiLSTM-CRF sequence labeler.
Flair NER Model | Data | Data Source | Model Description | Entities |
---|---|---|---|---|
ner | CoNLL 2003 eng(1,393 English news articles) | news articles | Contextual String Embeddings + BiLSTM-CRF | Location, Person, Organization, Misc |
ner-ontonotes | OntoNotes(~1745k articles | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | Contextual String Embeddings + BiLSTM-CRF | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal |
- Deep Pavlov
- There are two main types of models available: standard RNN based and BERT based.
Deep Pavlov NER Model | Data | Data Source | Model Description | Entities |
---|---|---|---|---|
ner_ontonotes | OntoNotes(~1745k articles | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | Bi-LSTM+CRF | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal |
ner_ontonotes_bert | OntoNotes(~1745k articles | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | BERT+Bi-LSTM+CRF | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal |
ner_conll2003 | CoNLL 2003 eng(1,393 English news articles) | news articles | Bi-LSTM+CRF | Location, Person, Organization, Misc |
ner_conll2003_bert | CoNLL 2003 eng(1,393 English news articles) | news articles | BERT+Bi-LSTM+CRF | Location, Person, Organization, Misc |
- AllenNLP
- fine grained ner: BiLSTM-CRF+ELMo
AllenNLP NER Model | Data | Data Source | Model Description | Entities |
---|---|---|---|---|
fine grained ner | OntoNotes(~1745k articles | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | BiLSTM-CRF+ELMo | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal |
- Polyglot NER
- It uses huge unlabelled datasets (like Wikipedia) with automatically inferred entity labels (via features such as hyperlinks).
- The internal links embedded in Wikipedia articles are used to detect named entity mentions. When a link points to an article identified by Freebase as an entity article,the anchor text is taken as a positive training example.
Polyglot NER Model | Data(vocabulary) | Data Source | Model Description | Entities |
---|---|---|---|---|
Polyglot NER | mostfrequent 100K words and the word | Wikipedia Articles and Freebase | Classifier (feedforward neural network) | Person, Locations, Organizations |
Overview: NER Model Performance
NER Model | Data | Data Source | Model Description | Entities | Performance F1 score(Dataset) |
---|---|---|---|---|---|
NLTK | ACE 2004 | newswire, broadcast news, telephone conversations | MaxEnt classifier | Organization, Person, Location, Date, Time, Money, Percent, Facility, GPE | 0.89 ± 0.11(CoNLL-2003) |
Stanford NER Model | CoNLL 2003 eng(1,393 English news articles) | news articles | CRFClassifier | Location, Person, Organization, Misc | 87.94%(CoNLL-2003) |
Polyglot NER | mostfrequent 100K words and the word | Wikipedia Articles and Freebase | Classifier (feedforward neural network) | Person, Locations, Organizations | 71.3%(CoNLL-2003) |
spaCyen_core_web_lg | OntoNotes(~1745k articles)(Vectors - 685k keys, 685k unique vectors (300 dimensions) | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities. | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal | 85.85%(OntoNotes 5) |
Flairner-fast | CoNLL 2003 eng(1,393 English news articles) | news articles | Contextual String Embeddings + BiLSTM-CRF | Location, Person, Organization, Misc | 93.09±0.12%(CoNLL-2003) |
Flairner-ontonotes-fast | OntoNotes(~1745k articles | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | Contextual String Embeddings + BiLSTM-CRF | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal | 89.7%(OntoNotes 5) |
Deep PavlovNer_ontonotes | OntoNotes(~1745k articles | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | Bi-LSTM+CRF | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal | 86.4%(OntoNotes 5) |
Deep PavlovNer_ontonotes_bert | OntoNotes(~1745k articles | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | BERT+Bi-LSTM+CRF | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal | 88.6%(OntoNotes 5) |
Deep Pavlovner_conll2003 | CoNLL 2003 eng(1,393 English news articles) | news articles | Bi-LSTM+CRF | Location, Person, Organization, Misc | 89.9%(CoNLL-2003) |
Deep PavlovNer_conll2003_bert | CoNLL 2003 eng(1,393 English news articles) | news articles | BERT+Bi-LSTM+CRF | Location, Person, Organization, Misc | 91.7%(CoNLL-2003) |
AllenNLP NER Model | OntoNotes(~1745k articles | telephone conversations, newswire, newsgroups, broadcast news, broadcast conversation, weblogs | BiLSTM-CRF+ELMo | Person, Norp, Fac, Org, Gpe, Loc, Product, Event, Work_Of_Art, Law, Language, Date, Time, Percent, Money, Quantity, Ordinal, Cardinal | 88.7%(OntoNotes 5) |
References
- NLTK
- Stanford NER
- https://nlp.stanford.edu/software/jenny-ner-2007.pdf
- https://towardsdatascience.com/conditional-random-fields-explained-e5b8256da776
- https://prateekvjoshi.com/2013/02/23/what-are-conditional-random-fields/
- https://nlp.stanford.edu/software/CRF-NER.shtml
- https://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf
- spaCy
- GATE
- Flair
- Deep Pavlov
- Allennlp
- Polyglot-NER
- Overview: NER Model Performance
- https://drops.dagstuhl.de/opus/volltexte/2016/6008/pdf/OASIcs-SLATE-2016-3.pdf
- https://nlp.stanford.edu/projects/project-ner.shtml
- https://spacy.io/usage/facts-figures
- https://arxiv.org/pdf/1410.3791.pdf
- http://docs.deeppavlov.ai/en/master/features/models/ner.html
- https://www.arxiv-vanity.com/papers/1904.10503/
- https://medium.com/@b.terryjack/nlp-pretrained-named-entity-recognition-7caa5cd28d7b
- https://towardsdatascience.com/named-entity-recognition-ner-meeting-industrys-requirement-by-applying-state-of-the-art-deep-698d2b3b4ede
Dataset
- NLTK
- ACE 2004: https://catalog.ldc.upenn.edu/LDC2005T09
- Stanford NER
- spaCy
- OntoNotes Release 5.0: https://catalog.ldc.upenn.edu/LDC2013T19
- OntoNotes Release 5.0: https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoN otes-Release-5.0.pdf
- Flair
- CoNLL 2003: https://www.clips.uantwerpen.be/conll2003/ner/
- OntoNotes Release 5.0: https://catalog.ldc.upenn.edu/LDC2013T19
- Deep Pavlov
- CoNLL 2003: https://www.clips.uantwerpen.be/conll2003/ner/
- OntoNotes Release 5.0: https://catalog.ldc.upenn.edu/LDC2013T19