日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

OHSUMED数据集介绍

發布時間:2025/3/15 编程问答 24 豆豆
生活随笔 收集整理的這篇文章主要介紹了 OHSUMED数据集介绍 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1. OHSUMED數據集介紹

本實驗中采用OHSUMED測試數據集合(其也被用于第9 屆國際文本檢索競賽TREC9 的文檔過濾子競賽)。OHSUMED 數據集合由William Hersh和他的同事們一起建立,其文檔來源于醫藥信息數據庫MEDLINE10,它包含了從1987 年到1991 年五年間270 個醫藥類雜志的標題和/或摘要,包含了348566個文檔。一個OHSUMED 文檔由8 個域組成,含義如下:

z .I 文章的OHSUMED 序列號,從1 到348566

z .U MEDLINE 標識

z .S 文章來源

z .M MeSH 索引詞

z .T 文章標題

z .P 文章類型

z .W 文章摘要

z .A 文章作者

OHSUMED 的作者還為文檔集合構造了106 個查詢,這些查詢來源于醫生在給病人看病的過程中所提交的查詢字符串,每一個查詢由兩部分組成:病人情況的簡單描述和所需信息的描述。一個OHSUMED 查詢由如下3 不同域組成:

z .I 文章的OHSUMED 序列號,從1 到106

z .B 患者信息

z .W 信息需求

基于以上的文檔集合和查詢集合,OHSUMED 一共標注了16140 個查詢-文

檔對,每一個查詢-文檔對都被標注成相關(definitely relevant)、部分相關(partially relevant)或者不相關(not relevant),最終的標注結果中一共包含了2557個相關、2932 個部分相關以及12498 個不相關的查詢-文檔對(一個文檔可能被標記成多個級別,在本節的實驗中,取其級別最高的標號作為其最終標號)。

Here are the files, their uncompressed size, and a description of their content:

1)? ohsumed.87 (60,303,307) — Contains the MEDLINE documents for the year 1987.? The format for each of the MEDLINE document files follows the conventions of the SMART system, with each field defined as below (NLM designator in parentheses):
??? .I??? sequential identifier
??? .U??? MEDLINE identifier (UI)
??? .M??? Human-assigned MeSH terms (MH)
??? .T??? Title (TI)
??? .P??? Publication type (PT)
??? .W??? Abstract (AB)
??? .A??? Author (AU)
??? .S??? Source (SO)
(Note:? Some references have their abstracts truncated at 250 words, while some have no abstracts at all.)

2)? ohsumed.88 (78,585,929) — Contains the MEDLINE documents for the year 1988, formatted as above.

3)? ohsumed.89 (84,719,077) — Contains the MEDLINE documents for the year 1989, formatted as above.

4)? ohsumed.90 (86,754,890) — Contains the MEDLINE documents for the year 1990, formatted as above.

5)? ohsumed.91 (89,761,122) — Contains the MEDLINE documents for the year 1991, formatted as above.

6)? queries (11,591) — Contains the 106 queries in test set, with patient and topic information, in the format:
??? .I??? Sequential identifier
??? .B??? Patient information
??? .W??? Information request

7)? drel.ui (26,919) — Contains the query-document pairs rated as definitely relevant, with documents listed by MEDLINE UI, in the format:
???

8)? drel.i (21,709) — Contains the query-document pairs rated as definitely relevant, with documents listed by sequential number (from the .I field),? in the format:
???

9)? pdrel.ui (57,831) — Contains the query-doc pairs rated as definitely or possibly relevant, with documents listed by MEDLINE UI,? in the format:
???

10)? pdrel.i (46,664) — Contains the query-doc pairs rated as definitely or possibly relevant, with documents listed by sequential number (from the .I field),? in the format:
???

11)? judged (368,366) — Contains a list of all retrieved documents by any of the five original searchers or SMART, sorted first by query number and then document number, along with their relevance judgments.? The relevance judgments are either d (definitely relevant), p (possibly relevant), or n (not relevant).? The relevance1 judgment is the original relevance judgment done on the documents retrieved by the original searchers.? The relevance 2 judgment is the second relevance judgment done to assess interobserver reliability of the relevance1 judgments.? The relevance3 judgment is the relevance judgment done on documents retrieved by SMART but not the original searchers, or another relevance judgment on an originally retrieved document to assess interobserver reliability.
???
??? [][]

12)? ui (3,137,094) — Contains the MEDLINE UI’s for all 348,566 documents in test database, listed one per line.

13)? readme — This file.
http://ir.ohsu.edu/ohsumed/ohsumed.html

與50位技術專家面對面20年技術見證,附贈技術全景圖

總結

以上是生活随笔為你收集整理的OHSUMED数据集介绍的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。