命名实体识别 实体抽取_您的公司为什么要关心命名实体的识别
命名實體識別 實體抽取
Named entity recognition is the task of categorizing text into entities, such as people, locations, and dates. For example, for the sentence, On April 30, 1789, George Washington was inaugurated as the first president of the United States , this sentence may be tagged with the following entities:
命名實體識別是將文本分類為實體的任務,例如人物,位置和日期。 例如,對于On April 30, 1789, George Washington was inaugurated as the first president of the United States宣告On April 30, 1789, George Washington was inaugurated as the first president of the United States ,該句子可能帶有以下實體標記:
Image from Zach Monge圖片來自Zach MongeYou might be thinking, okay exactly how is this useful? Well, there are many potential uses of named entity recognition, but one is being able to make a database easily searchable. You might be thinking, why would I need to tag entities to make a database easily searchable? Can’t I just use a simple dictionary lookup to exactly match terms? Well, yes, you can, but this is far from ideal and just to show you how ineffective searches can be without named entity recognition, let’s walk through a real life example.
您可能會想,好吧,這到底有什么用? 好的,命名實體識別有許多潛在用途,但是其中一個功能是使數據庫易于搜索。 您可能在想,為什么我需要標記實體以使數據庫易于搜索? 我不能只使用簡單的字典查找來完全匹配術語嗎? 是的,可以,但是這遠非理想,只是為了向您展示在沒有命名實體識別的情況下如何進行無效搜索,讓我們來看一個真實的例子。
例 (Example)
Recently I was ordering food at my local grocery store, Weis Markets, and was trying to add to my cart Perdue frozen chicken fingers. So I typed into the search bar:
最近,我在當地的雜貨店Weis Markets點菜,并試圖將Perdue冷凍雞手指添加到我的購物車中。 所以我輸入了搜索欄:
Weis MarketsWeis MarketsTo my disappointment, my search did not yield any results:
令我失望的是,我的搜索沒有得到任何結果:
Weis MarketsWeis MarketsAt first I thought they may have been out of stock, but after searching for several other items, I kept getting no results. After awhile, I started to suspect that Weis’s search engine was only able to find search terms that almost exactly matched the product label (Note: I do not actually know the machinery behind Weis’s search engine). So I looked up on Google what the chicken fingers I wanted were exactly called and I realized they are called chicken tenders not fingers (of course!). So I typed perdue chicken tenders into the search box and it worked! I was then successfully able to add the chicken fingers to my cart.
起初我以為它們可能沒貨了,但是在搜索了其他幾項之后,我一直沒有得到任何結果。 一段時間后,我開始懷疑Weis的搜索引擎只能找到幾乎與產品標簽完全匹配的搜索詞(注意:我實際上并不知道Weis搜索引擎背后的機制)。 因此,我在Google上查到了我想要的雞手指的確切名稱,然后我意識到它們被稱為雞肉嫩而不是手指 (當然!)。 因此,我在搜索框中輸入了perdue chicken tenders ,它起作用了! 然后,我成功地將雞手指添加到購物車中。
Weis MarketsWeis Markets Weis MarketsWeis MarketsI was happy that I was able to add the chicken fingers to my cart, but this was a lot of work to just find one item and I had this same issue with several other items. This made Weis’s online shopping almost unusable! Since then I have not purchased groceries online from this store — it’s just too much work.
我很高興能夠將雞爪添加到購物車中,但是要找到一個項目卻需要很多工作,而其他幾個項目也遇到了同樣的問題。 這使得Weis的在線購物幾乎無法使用! 從那以后,我再也沒有從這家商店在線購買雜貨了-太累了。
解決方案 (The Solution)
Fortunately for Weis Market, there is a somewhat easy fix to their search engine issue and that is to use named entity recognition. With named entity recognition, the search engine should automatically tag each of the entities. For example, when I typed in perdue chicken fingers it should have tagged Perdue as the brand and chicken fingers as chicken tender (I am not not an expert in food categories, so I do not actually know if chicken tender would be a useful category).
幸運的是,對于Weis Market而言,可以輕松解決其搜索引擎問題,即使用命名實體識別。 使用命名實體識別,搜索引擎應自動標記每個實體。 例如,當我鍵入perdue chicken fingers ,應該將Perdue標記為品牌,并且將chicken fingers標記為雞嫩(我不是食品類別的專家,所以我實際上不知道雞嫩是否會是有用的類別) 。
Image from Zach Monge圖片來自Zach MongeThen, this would search through a database, where each item has been previously tagged. So the actual chicken fingers I wanted may have been previously tagged with the following categories: brand=Perdue; food=chicken tender; frozen, fresh, canned: frozen.
然后,這將在數據庫中進行搜索,每個項目之前都已在其中進行了標記。 因此,我之前想要的實際雞手指可能以前被標記了以下類別: brand = Perdue; 食物 =雞肉嫩; 冷凍,新鮮,罐頭 :冷凍。
Image from Zach Monge圖片來自Zach MongeWith the use of these entities and a structured database, my search for perdue chicken fingers would have matched Perdue as the brand and chicken tender as the food and would likely have included the chicken fingers I wanted in my search results.
通過使用這些實體和結構化的數據庫,我對perdue chicken fingers搜索將與Perdue作為品牌,而將chicken tender作為食品,并且可能將我想要的雞手指包括在搜索結果中。
結論 (Conclusions)
So as you can see, named entity recognition can be extremely useful and is almost essential for some products. You can imagine all the possible other uses besides creating a search engine for a grocery store (e.g., recommending similar online articles based upon tagged entities, creating an easily searchable database of interview transcripts, etc.). Something I have not mentioned in this post is the machine learning approaches that may be used to actually conduct the named entity recognition task (in the example, the task of tagging entities in the search perdue chicken fingers). This is the first installment of a series of blog posts about named entity recognition and the next post will go more into the technical details. Lastly, if you think your company may benefit from named entity recognition, feel free to reach out to me — my contact information may be found on my website.
如您所見,命名實體識別可能非常有用,并且對于某些產品幾乎是必不可少的。 您可以想象除了為雜貨店創建搜索引擎之外,所有其他可能的用途(例如,基于標記的實體推薦類似的在線文章,創建易于搜索的采訪記錄數據庫等)。 我在這篇文章中沒有提到的是機器學習方法,可用于實際執行命名實體識別任務(在本示例中,是在搜索perdue chicken fingers中標記實體的任務)。 這是有關命名實體識別的一系列博客文章的第一部分,下一篇文章將進一步介紹技術細節。 最后,如果您認為您的公司可以從命名實體的認可中受益,請隨時與我聯系-我的聯系信息可以在我的網站上找到 。
翻譯自: https://towardsdatascience.com/why-your-company-should-care-about-named-entity-recognition-e00de2f45700
命名實體識別 實體抽取
總結
以上是生活随笔為你收集整理的命名实体识别 实体抽取_您的公司为什么要关心命名实体的识别的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 开年放王炸!一加内部开工红包泄露天机:核
- 下一篇: 机器学习 异常值检测_异常值是否会破坏您