UCI数据集汇总及描述
| 1. Abalone: Predict the age of abalone from physical measurements 鮑魚DataSet:根據(jù)物理度量,預測鮑魚的年齡。 2. Abscisic Acid Signaling Network: The objective is to determine the set of boolean rules that describe the interactions of the nodes within this plant signaling network. The dataset includes 300 separate boolean pseudodynamic simulations using an asynchronous update scheme. 目標是測定布爾值的度量集合,以描述植物的信號網(wǎng)路節(jié)點。該數(shù)據(jù)集包括了300個獨立的布爾值形式的虛擬動態(tài)模擬值,使用了異步更新的架構。 3. Acute Inflammations: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. 急性炎癥DataSet:數(shù)據(jù)來源于一位醫(yī)學專家的數(shù)據(jù)集,用以檢測專家系統(tǒng),可以推斷出泌尿系統(tǒng)的兩種疾病的診斷結果。 4. Adult: Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset. 成人DataSet:根據(jù)戶口普查資料,預測收入是否能超過50000美元/年。通常也被稱為“收入普查”數(shù)據(jù)集。 5. Annealing: Steel annealing data 退火DataSet:訓練退火數(shù)據(jù)。 6. Anonymous Microsoft Web Data: Log of anonymous users of www.microsoft.com; predict areas of the web site a user visited based on data on other areas the user visited. 匿名微軟網(wǎng)絡數(shù)據(jù):微軟網(wǎng)站的匿名用戶記錄;通過其他的用戶訪問區(qū)域數(shù)據(jù),預測用戶在web站點的訪問區(qū)域。 7. Arcene: ARCENE's task is to distinguish cancer versus normal patterns from mass-spectrometric data. This is a two-class classification problem with continuous input variables. This dataset is one of 5 datasets of the NIPS 2003 feature selection challenge. ArceneDataSet:該數(shù)據(jù)集的任務是根據(jù)大量的觀測數(shù)據(jù),從正常的模式中辨別出癌癥。這是一個根據(jù)不斷輸入的變量的二級分類問題。該數(shù)據(jù)集是從NIPS2003特征選擇挑戰(zhàn)比賽中的5個數(shù)據(jù)集之一。 8. Arrhythmia: Distinguish between the presence and absence of cardiac arrhythmia and classify it in one of the 16 groups. 心率失常DataSet:分辨是否出現(xiàn)心率失常,并將結果分類進16個組之一。 9. Artificial Characters: Dataset artificially generated by using first order theory which describes structure of ten capital letters of English alphabet 人為性狀DataSet:通過使用第一次序理論(該理論可以描述出英語字母表的十個開頭字母的結構),自動生成的數(shù)據(jù)集。 10. Audiology (Original): Nominal audiology dataset from Baylor 原始AudiologyDataSet:來自Baylor的標稱型的audiology數(shù)據(jù)集。 11. Audiology (Standardized): Standardized version of the original audiology database 標準AudiologyDataSet:原始Audiology數(shù)據(jù)集的標準化版本。 12. Australian Sign Language signs: This data consists of sample of Auslan (Australian Sign Language) signs. Examples of 95 signs were collected from five signers with a total of 6650 sign samples. 澳大利亞標記語言標記DataSet:這些數(shù)據(jù)包括了澳大利亞標記語言標記的樣本。95個實例,均來自五個標識器,其中有6650個標記樣本。 13. Australian Sign Language signs (High Quality): This data consists of sample of Auslan (Australian Sign Language) signs. 27 examples of each of 95 Auslan signs were captured from a native signer using high-quality position trackers 澳大利亞標記語言標記DataSet高品質版:該數(shù)據(jù)集包含了Auslan標記的樣本。有27個實例,它們來自95個標記,這27個實例是使用高質量位置追蹤器的當?shù)貥俗R器捕捉出來的。 14. Auto MPG: Revised from CMU StatLib library, data concerns city-cycle fuel consumption 自動MPGDataSet:來自CMU StatLib實驗室的精品,是與城市循環(huán)能源消耗相關的數(shù)據(jù)集。 15. Automobile: From 1985 Ward's Automotive Yearbook 汽車DataSet:來自1985的沃德自動化年鑒。 16. AutoUniv: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of real data. Data can be generated in .csv, ARFF or C4.5 formats. AutoUniv是一個高級數(shù)據(jù)生成器,可以用來處理分類任務。目標是反映現(xiàn)實數(shù)據(jù)的微妙與不同之處。數(shù)據(jù)可以在.csv中生成,采用ARFF或者C4.5的格式。 17. Bach Chorales: Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp 基于Chorales的時間序列數(shù)據(jù)集;可以用來挑戰(zhàn)生成性的語法;數(shù)據(jù)放在Lisp中。 18. Badges: Badges labeled with a "+" or "-" as a function of a person's name 徽章DataSet:標記了“+”或“-”的符號的標記,可以作為一個人姓名的函數(shù)表達式。 19. Bag of Words: This data set contains five text collections in the form of bags-of-words. 詞語包DataSet:該數(shù)據(jù)集包含了5個文本集合,每個文本集合以詞語包的形式展現(xiàn)。 20. Balance Scale: Balance scale weight & distance database 天平DataSet:天平的重量和距離數(shù)據(jù)庫。 21. Balloons: Data previously used in cognitive psychology experiment; 4 data sets represent different conditions of an experiment 氣球DataSet:曾經(jīng)用在認知心理學實驗中的數(shù)據(jù);4個數(shù)據(jù)集代表了一個實驗中的不同條件。 22. Blood Transfusion Service Center: Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem. 輸血服務中心DataSet:來自臺灣的Hsin-CHu市的輸血服務中心的數(shù)據(jù)——用以解決分類問題。 23. Breast Cancer: Breast Cancer Data (Restricted Access) 乳腺癌DataSet:乳腺癌數(shù)據(jù)(訪問限制)。 24. Breast Cancer Wisconsin (Diagnostic): Diagnostic Wisconsin Breast Cancer Database 乳腺癌威斯康星洲(診斷數(shù)據(jù))DataSet:威斯康星的乳腺癌診斷數(shù)據(jù)。 25. Breast Cancer Wisconsin (Original): Original Wisconsin Breast Cancer Database 乳腺癌威斯康星洲(原始數(shù)據(jù)):原始的威斯康星州乳腺癌數(shù)據(jù)庫。 26. Breast Cancer Wisconsin (Prognostic): Prognostic Wisconsin Breast Cancer Database 乳腺癌威斯康星洲(Prognostic版):威斯康星州乳腺癌數(shù)據(jù)庫。 27. Breast Tissue: Dataset with electrical impedance measurements of freshly excised tissue samples from the breast. 乳腺組織DataSet:乳腺的新鮮切除組織樣本的電阻度量數(shù)據(jù)集。 28. CalIt2 Building People Counts: This data comes from the main door of the CalIt2 building at UCI. Calt2建筑的人數(shù):該數(shù)據(jù)集來自UCI的Calts建筑的主要大門。 29. Car Evaluation: Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods. 汽車評估DataSet:來源于簡單層次決策模型,該數(shù)據(jù)集可用于測試建設性的回歸,和發(fā)現(xiàn)結構性方法。 30. Cardiotocography: The dataset consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians. 胎兒心率DataSet:該數(shù)據(jù)集包括胎兒心率(FHR),和基于產(chǎn)科專家醫(yī)生分類的cardiotocograms 子宮收縮(UC)特征。 31. Census Income: Predict whether income exceeds $50K/yr based on census data. Also known as "Adult" dataset. 收入普查DataSet:基于普查數(shù)據(jù),預測收入是否超過50000美元/年。也被稱為“成人”數(shù)據(jù)集。 32. Census-Income (KDD): This data set contains weighted census data extracted from the 1994 and 1995 current population surveys conducted by the U.S. Census Bureau. 收入普查(KDD)DataSet:這個數(shù)據(jù)集包含了從1994-1995年的U.S普查局的《當前人口調查》中提取出來的普查數(shù)據(jù)。 33. Challenger USA Space Shuttle O-Ring: Task: predict the number of O-rings that experience thermal distress on a flight at 31 degrees F given data on the previous 23 shuttle flights 挑戰(zhàn)者號USA航天飛機O形圈DataSet:任務:基于前23次飛行數(shù)據(jù),預測在一次31度熱壓F的狀況中的飛行任務的O形圈的數(shù)目。 34. Character Trajectories: Multiple, labelled samples of pen tip trajectories recorded whilst writing individual characters. All samples are from the same writer, for the purposes of primitive extraction. Only characters with a single pen-down segment were considered. 字符軌跡DataSet:同時寫出單個字幕的筆尖軌道的多個標記樣本記錄。為了保證初始的提取數(shù)據(jù),所有的樣本都來自于同一個書寫人員。僅僅考慮了單一落筆段的字符。 35. Chess (Domain Theories): 6 different domain theories for generating legal moves of chess 國際象棋(域理論)DataSet:產(chǎn)生國際象棋的規(guī)定路數(shù)的6個不同的域理論。 36. Chess (King-Rook vs. King): Chess Endgame Database for White King and Rook against Black King (KRK). 國際象棋(王RookVS王)DataSet:白國王與黑國王的象棋殘局數(shù)據(jù)庫。 37. Chess (King-Rook vs. King-Knight): Knight Pin Chess End-Game Database Creator 國際象棋(王Rook對戰(zhàn)騎士):騎士 38. Chess (King-Rook vs. King-Pawn): King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7). 國王Rook與國王Pawn的a7(通常簡寫為KAEPA7)。 39. Cloud: Little Documentation 小文檔。 40. CMU Face Images: This data consists of 640 black and white face images of people taken with varying pose (straight, left, right, up), expression (neutral, happy, sad, angry), eyes (wearing sunglasses or not), and size CMU人臉圖像DataSet:該數(shù)據(jù)集包含了640張黑白人臉圖像,并且有直、左、右、上四個角度,中性、高興、悲傷、生氣四個表情,有的戴著太陽鏡,有的沒有,并且大小也不一。 41. Coil 1999 Competition Data: This data set is from the 1999 Computational Intelligence and Learning (COIL) competition. The data contains measurements of river chemical concentrations and algae densities. Coil1999競賽數(shù)據(jù):該數(shù)據(jù)集來自1999年的計算機智能學習競賽(簡寫為Coil)。該數(shù)據(jù)集包含了河流的化學濃度度量和藻類的密度度量。 42. Communities and Crime: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. 社區(qū)與犯罪DataSet:美國的社區(qū)。該數(shù)據(jù)集包含了來自1990美國普查的社會經(jīng)濟數(shù)據(jù)、來自1990美國LEMAS調查的法律實施數(shù)據(jù),還有來自1995年FBI UCR的犯罪數(shù)據(jù)。 43. Communities and Crime Unnormalized: Communities in the US. Data combines socio-economic data from the '90 Census, law enforcement data from the 1990 Law Enforcement Management and Admin Stats survey, and crime data from the 1995 FBI UCR 社區(qū)和非標準化犯罪DataSet:美國的社區(qū)。數(shù)據(jù)包含了來自90年代普查的社會經(jīng)濟數(shù)據(jù)、來自1990年法律實施管理調查的法律實施數(shù)據(jù),還有來自1995年FBI UCR的犯罪數(shù)據(jù)。 44. Computer Hardware: Relative CPU Performance Data, described in terms of its cycle time, memory size, etc. 計算機硬件:相關CPU運行數(shù)據(jù),采用它的時間周期、內(nèi)存大小來描述。 45. Concrete Compressive Strength: Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. 混凝土抗壓強度DataSet:混凝土是土木工程中最重要的材料。抗壓強度是混凝土年齡與組成非線性特征。 46. Concrete Slump Test: Concrete is a highly complex material. The slump flow of concrete is not only determined by the water content, but that is also influenced by other concrete ingredients. 混凝土塌方度試驗:混凝土是一種非常復雜的材料。它的塌落度流量不僅取決于含水量,也受其他具體成分的影響。 47. Congressional Voting Records: 1984 United Stated Congressional Voting Records; Classify as Republican or Democrat 國會投票記錄DataSet:1984年美國國會投票記錄;按照共和黨與民主黨分類。 48. Connect-4: Contains connect-4 positions 連接4:包含了連接4的位置。 49. Connectionist Bench (Nettalk Corpus): The file "nettalk.data" contains a list of 20,008 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes 連接工作臺(Nettalk資料庫):文件“nettalk.data”包含了一個有20008個英語單詞的列表,還有一個每個單詞的phonetic副本。任務是訓練一個網(wǎng)絡,用來產(chǎn)生適當?shù)膒honemes。 50. Connectionist Bench (Sonar, Mines vs. Rocks): The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock. 連接工作臺(聲納、礦產(chǎn)和巖石):目標是訓練一個網(wǎng)絡,用來區(qū)別在金屬圓柱體的反彈聲納信號,和在基本為圓柱體的巖石上的反彈信號。 51. Connectionist Bench (Vowel Recognition - Deterding Data): Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios. 連接工作臺(元音識別—Detering數(shù)據(jù)):使用一個來源于一個比率的指定訓練集的11個英式英語的穩(wěn)定元音字母的獨立識別揚聲器。 52. Contraceptive Method Choice: Dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. 避孕方法的選擇:該數(shù)據(jù)集是1997年印度尼西亞全國的避孕患病率調查的的一個子集。 53. Corel Image Features: This dataset contains image features extracted from a Corel image collection. Four sets of features are available based on the color histogram, color histogram layout, color moments, and co-occurrence Corel圖像特征:該數(shù)據(jù)集包含了提取自一個Corel圖像集合的圖片特征。基于顏色直方圖、顏色直方圖布局、顏色的時機和調和,可得到四個特征集合。 54. Covertype: Forest CoverType dataset 覆蓋類型:森林覆蓋類型數(shù)據(jù)集。 55. Credit Approval: This data concerns credit card applications; good mix of attributes 信貸審批:該數(shù)據(jù)集與信用卡的使用相關;是各種屬性的集合。 56. Cylinder Bands: Used in decision tree induction for mitigating process delays known as "cylinder bands" in rotogravure printing 氣缸帶:使用判定樹來歸納,減緩氣缸帶的凸版打印。 57. Demospongiae: Marine sponges of the Demospongiae class classification domain. Demospongiae類別下的海綿分類域。 58. Dermatology: Aim for this dataset is to determine the type of Eryhemato-Squamous Disease. 皮膚科:該數(shù)據(jù)集用于判定Eryhemato鱗狀疾病的類型。 59. Dexter: DEXTER is a text classification problem in a bag-of-word representation. This is a two-class classification problem with sparse continuous input variables. This dataset is one of five datasets of the NIPS 2003 feature selection challenge. DETEX是一個用一個文字包來表現(xiàn)的文本分類問題。這是一個通過不斷的輸入?yún)?shù)的兩層的分類問題。該數(shù)據(jù)集是NIPS2003年特征提取邀請賽的五個數(shù)據(jù)集中的一個?!?/span> 60. DGP2 - The Second Data Generation Program: Generates application domains based on specific parameters, number of features, and proportion of positive to negative examples DGP2—第二個數(shù)據(jù)生成程序:基于具體的參數(shù)、特征的數(shù)量、和正面到負面例子的比率,產(chǎn)生應用域。 61. Diabetes: This diabetes dataset is from AIM '94 糖尿病:該糖尿病數(shù)據(jù)集來自AIM94。 62. Document Understanding: Five concepts, expressed as predicates, to be learned 文件理解:要學習的五個概念,作為謂詞來表現(xiàn)。 63. Dodgers Loop Sensor: Loop sensor data was collected for the Glendale on ramp for the 101 North freeway in Los Angeles Dodgers回路傳感器:回路傳感器數(shù)據(jù)集來自Gledale的斜坡(在洛杉磯的101個北高速公路)。 64. Dorothea: DOROTHEA is a drug discovery dataset. Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the NIPS 2003 feature selection challenge. Dorothea是一個藥物發(fā)現(xiàn)數(shù)據(jù)集。以結構分析特征來表現(xiàn)的化合物必須分類為活性的(綁定到凝血酶)或者非活性的。這是五個NIPS2003特征選擇挑戰(zhàn)賽數(shù)據(jù)集中的一個。 65. E. Coli Genes: Data giving characteristics of each ORF (potential gene) in the E. coli genome. Sequence, homology (similarity to other genes) and structural information, and function (if known) are provided. 大腸桿菌基因:每個在E.coli基因組里面ORD(潛在基因)的特征數(shù)據(jù)集。提供序列、同源性(與其他基因的相似形)和結構信息。還有功能(如果知道的話)。 66. EBL Domain Theories: Assorted small-scale domain theories EBL域理論:各種小規(guī)模的域理論。 67. Echocardiogram: Data for classifying if patients will survive for at least one year after a heart attack 超聲心動圖:該數(shù)據(jù)集用來分類是否病人在一次心臟病后,至少可以存活一年。 68. Ecoli: This data contains protein localization sites 該數(shù)據(jù)集包含了蛋白質本地化地址。 69. Economic Sanctions: Domain Theory on Economic Sanctions; Undocumented 經(jīng)濟制裁:經(jīng)濟制裁方面的域理論,無記錄文檔。 70. EEG Database: This data arises from a large study to examine EEG correlates of genetic predisposition to alcoholism. It contains measurements from 64 electrodes placed on the scalp sampled at 256 Hz EEG數(shù)據(jù)庫:該數(shù)據(jù)集來源于一個檢查EEG的、與易患酒精中毒的基因體質相關的大型研究、包含了放在頭皮上的、為256HZ的、來自64個電極的度量。 71. El Nino: The data set contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific. 厄爾尼諾:該數(shù)據(jù)集包含了從整個赤道太平洋的一系列浮標的海洋與地面氣象讀數(shù)。 72. Entree Chicago Recommendation Data: This data contains a record of user interactions with the Entree Chicago restaurant recommendation system. 芝加哥主菜推薦數(shù)據(jù):該數(shù)據(jù)集包含了一個與芝加哥主菜館的推薦系統(tǒng)的用戶交互的記錄。 73. Flags: From Collins Gem Guide to Flags, 1986 標志:從柯林斯寶石指南的標志,1986 74. Forest Fires: This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data (see details at: http://www.dsi.uminho.pt/~pcortez/forestfires). 森林火災:這是一個艱難的回歸的任務,其目的是在葡萄牙東北部地區(qū),利用氣象數(shù)據(jù)和其他數(shù)據(jù),預測森林火災的過火面積,(詳見:http://www.dsi.uminho PT / pcortez / forestfires)。 75. Function Finding: Cases collected mostly from investigations in physical science; intention is to evaluate function-finding algorithms 尋找功能:收集的情況下,大多是從在物理科學的調查;意圖是評價函數(shù)發(fā)現(xiàn)算法 76. Gisette: GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusible digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection challenge. Gisette:GISETTE是一個手寫數(shù)字識別問題。問題是獨立的高度confusible數(shù)字'4'和'9'。這個數(shù)據(jù)集是5 NIPS的2003年特征選擇挑戰(zhàn)的數(shù)據(jù)集之一。 77. Glass Identification: From USA Forensic Science Service; 6 types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc) 玻璃鑒定:從美國法醫(yī)科學服務; 6種玻璃;在他們的氧化物含量定義(即鈉,鐵,鉀等) 78. Haberman's Survival: Dataset contains cases from study conducted on the survival of patients who had undergone surgery for breast cancer 哈伯曼的生存:DataSet包含誰經(jīng)歷了乳腺癌手術患者的生存所進行的研究情況 79. Hayes-Roth: Topic: human subjects study 海斯 - 羅斯:主題:人類受試者的研究 80. Heart Disease: 4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach 心臟病:4個數(shù)據(jù)庫:克利夫蘭,匈牙利,瑞士,和弗吉尼亞州的長灘 81. Hepatitis: From G.Gong: CMU; Mostly Boolean or numeric-valued attribute types; Includes cost data (donated by Peter Turney) 肝炎:從G.龔:債務工具中央結算系統(tǒng);大多是布爾值或數(shù)字值的屬性類型,包括成本數(shù)據(jù)(彼得特尼捐贈) 82. Hill-Valley: Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y co-ordinate, the points will create either a Hill (a �bump� in the terrain) or a Valley (a �dip� in the terrain). 希爾谷:每個記錄代表一個二維圖形上100點。當策劃,以統(tǒng)籌的Y(從1到100),積分將創(chuàng)建一個山(在凹凸的地形)或谷(浸在地形)。 83. Horse Colic: Well documented attributes; 368 instances with 28 attributes (continuous, discrete, and nominal); 30% missing values 馬絞痛:有據(jù)可查的屬性; 368 28屬性(連續(xù),離散的,標稱值)的實例; 30%的缺失值 84. Housing: Taken from StatLib library 房屋:兩者StatLib庫 85. ICU: Data set prepared for the use of participants for the 1994 AAAI Spring Symposium on Artificial Intelligence in Medicine. ICU的數(shù)據(jù)集,為1994年AAAI春季研討會的與會者在醫(yī)學上使用人工智能準備。 86. Image Segmentation: Image data described by high-level numeric-valued attributes, 7 classes 圖像分割:由高層次的數(shù)字值屬性描述的圖像數(shù)據(jù),7類 87. Insurance Company Benchmark (COIL 2000): This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company. The data consists of 86 variables and includes product usage data and socio-demographic data 保險公司的基準(線圈2000年):使用該數(shù)據(jù)集在線圈2000挑戰(zhàn)包含保險公司對客戶的信息。該數(shù)據(jù)由86變數(shù),包括產(chǎn)品使用的數(shù)據(jù)和社會人口數(shù)據(jù) 88. Internet Advertisements: This dataset represents a set of possible advertisements on Internet pages. 互聯(lián)網(wǎng)廣告:這個DataSet表示一組可能在互聯(lián)網(wǎng)上的網(wǎng)頁廣告。 89. Internet Usage Data: This data contains general demographic information on internet users in 1997. 互聯(lián)網(wǎng)應用的數(shù)據(jù):該數(shù)據(jù)包含一般的互聯(lián)網(wǎng)用戶在1997年的人口統(tǒng)計信息。 90. Ionosphere: Classification of radar returns from the ionosphere 電離層:從電離層雷達回波分類 91. IPUMS Census Database: This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. IPUMS普查數(shù)據(jù)庫:該數(shù)據(jù)集包含未加權PUMS普查從洛杉磯和長灘地區(qū)1970年,1980年和1990年的數(shù)據(jù)。 92. Iris: Famous database; from Fisher, 1936 光圈:著名的數(shù)據(jù)庫;從1936年費舍爾, 93. ISOLET: Goal: Predict which letter-name was spoken--a simple classification task. ISOLET:目標:預測字母名稱是口語 - 一個簡單的分類任務。 94. Japanese Credit Screening: Includes domain theory (generated by talking to Japanese domain experts); data in Lisp 日本信用篩選:包括域理論(日本領域的專家交談生成);在Lisp中的數(shù)據(jù) 95. Japanese Vowels: This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers. 日本元音:該數(shù)據(jù)集的記錄640 12的LPC倒譜系系數(shù)從九男揚聲器的時間序列。 96. KDD Cup 1998 Data: This is the data set used for The Second International Knowledge Discovery and Data Mining Tools Competition, which was held i?n conjunction with KDD-98 KDD杯1998年的數(shù)據(jù):這是數(shù)據(jù)集的第二屆國際知識發(fā)現(xiàn)和數(shù)據(jù)挖掘工具的競爭,這是在同時舉行的KDD - 98 97. KDD Cup 1999 Data: This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 KDD杯1999年的數(shù)據(jù):這是數(shù)據(jù)集使用的第三次國際知識發(fā)現(xiàn)和數(shù)據(jù)挖掘工具的競爭,這是在同時舉行的KDD - 99 98. Kinship: Relational dataset 親屬關系:關系數(shù)據(jù)集 99. Labor Relations: From Collective Bargaining Review 勞動關系:從集體談判檢討 100. LED Display Domain: From Classification and Regression Trees book; We provide here 2 C programs for generating sample databases LED顯示域:從分類和回歸樹書,我們在這里提供2 C程序生成示例數(shù)據(jù)庫 101. Lenses: Database for fitting contact lenses 鏡頭:裝修隱形眼鏡數(shù)據(jù)庫 102. Letter Recognition: Database of character image features; try to identify the letter 信承認:人物形象特征的數(shù)據(jù)庫;試圖找出信 103. Libras Movement: The data set contains 15 classes of 24 instances each. Each class references to a hand movement type in LIBRAS (Portuguese name 'L�ngua BRAsileira de Sinais', oficial brazilian signal language). 天秤座的運動:該數(shù)據(jù)集包含了15類24個實例。每個類的引用,在天秤座的人的手部動作類型(葡萄牙名“Lngua BRAsileira Sinais”,公報巴西信號語言)。 104. Liver Disorders: BUPA Medical Research Ltd. database donated by Richard S. Forsyth 肝臟疾病:保柏醫(yī)療研究公司數(shù)據(jù)庫由理查德福塞斯捐贈 105. Localization Data for Person Activity: Data contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing the same scenario five times. 人活動的本地化數(shù)據(jù):數(shù)據(jù)包含五個執(zhí)行不同的活動的人的錄音。每個人穿的4個傳感器(標簽),同時執(zhí)行相同的情況下的五倍。 106. Logic Theorist: All code for Logic Theorist 邏輯理論家:邏輯理論家的所有代碼 107. Low Resolution Spectrometer: From IRAS data -- NASA Ames Research Center 低分辨率光譜儀:從紅外天文衛(wèi)星數(shù)據(jù) - 美國國家航空航天局艾姆斯研究中心 108. Lung Cancer: Lung cancer data; no attribute definitions 肺癌:肺癌數(shù)據(jù);沒有屬性定義 109. Lymphography: This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. (Restricted access) 淋巴造影:從大學醫(yī)學中心,腫瘤研究所,南斯拉夫盧布爾雅那的這淋巴域。 (限制訪問) 110. M. Tuberculosis Genes: Data giving characteristics of each ORF (potential gene) in the M. tuberculosis bacterium. Sequence, homology (similarity to other genes) and structural information, and function (if known) are provided 結核分枝桿菌基因:給每個ORF在結核分枝桿菌的細菌特性(潛在的基因)的數(shù)據(jù)。序列,同源性(其他基因的相似性)和結構信息,和功能(如果已知) 111. Madelon: MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is that the problem is multivariate and highly non-linear. Madelon:MADELON是一個人造的數(shù)據(jù)集,這是對2003年的NIPS的特征選擇挑戰(zhàn)的一部分。這是一個連續(xù)的輸入變量的兩個類的分類問題。困難的是,問題是多元的和高度非線性。 112. MAGIC Gamma Telescope: Data are MC generated to simulate registration of high energy gamma particles in an atmospheric Cherenkov telescope 魔伽馬望遠鏡:數(shù)據(jù)生成高能量的伽瑪粒子來模擬大氣切倫科夫望遠鏡登記MC 113. Mammographic Mass: Discrimination of benign and malignant mammographic masses based on BI-RADS attributes and the patient's age. 乳腺質量:良性和惡性乳腺群眾基于BI - RADS的屬性和病人的年齡歧視。 114. Mechanical Analysis: Fault diagnosis problem of electromechanical devices; also PUMPS DATA SET is newer version with domain theory and results 力學分析:機電設備的故障診斷問題;水泵數(shù)據(jù)集與域的理論和成果是較新的版本 115. Meta-data: Meta-Data was used in order to give advice about which classification method is appropriate for a particular dataset (taken from results of Statlog project). 元數(shù)據(jù):元數(shù)據(jù)使用的分類方法是適合于一個特定的數(shù)據(jù)集(Statlog項目的結果),以提供意見。 116. MiniBooNE particle identification: This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). MiniBooNE的粒子鑒別:該數(shù)據(jù)集是從MiniBooNE的實驗是使用電子中微子(信號),以區(qū)別于μ子中微子(背景)。 117. Mobile Robots: Learning concepts from sensor data of a mobile robot; set of data sets 移動機器人:從移動機器人的傳感器數(shù)據(jù)學習觀念;組數(shù)據(jù)集 118. Molecular Biology (Promoter Gene Sequences): E. Coli promoter gene sequences (DNA) with partial domain theory 分子生物學(啟動子序列):大腸桿菌啟動子的基因序列(DNA)的部分域理論 119. Molecular Biology (Protein Secondary Structure): From CMU connectionist bench repository; Classifies secondary structure of certain globular proteins 分子生物學(蛋白質二級結構):從債務工具中央結算系統(tǒng)聯(lián)結板凳資源庫;某些球狀蛋白質的二級結構進行分類 120. Molecular Biology (Splice-junction Gene Sequences): Primate splice-junction gene sequences (DNA) with associated imperfect domain theory 分子生物學(拼接交界的基因序列):靈長類動物的基因序列拼接結與相關的不完善域理論(脫氧核糖核酸) 121. MONK's Problems: A set of three artificial domains over the same attribute space; Used to test a wide range of induction algorithms 和尚的問題:三個以上相同的屬性空間的人工域;用于測試一個廣泛的歸納算法 122. Moral Reasoner: Horn-clause model that qualitatively simulates moral reasoning; Theory includes negated literals 道德推理:霍恩子句模型定性模擬道德推理理論包括否定的文字 123. Movie: This data set contains a list of over 10000 films including many older, odd, and cult films. There is information on actors, casts, directors, producers, studios, etc. 電影:該數(shù)據(jù)集包含一個10000多部電影,包括許多年紀大了,奇怪,和邪教的電影列表。有上的演員,演員,董事,制片人,制片公司等信息 124. MSNBC.com Anonymous Web Data: This data describes the page visits of users who visited msnbc.com on September 28, 1999. Visits are recorded at the level of URL category (see description) and are recorded in time order. MSNBC.com匿名Web數(shù)據(jù):這個數(shù)據(jù)描述了用戶的頁面訪問參觀,1999年9月28日msnbc.com。記錄訪問的URL類別的水平(見說明),在時間順序記錄。 125. Multiple Features: This dataset consists of features of handwritten numer?als (`0'--`9') extracted from a collection of Dutch utility maps 多種功能:這個數(shù)據(jù)集,包括從荷蘭實用地圖的集合中提取的手寫體數(shù)字(`0'結束 - `9“)功能 126. Mushroom: From Audobon Society Field Guide; mushrooms described in terms of physical characteristics; classification: poisonous or edible 蘑菇:從Audobon社會領域指南“;蘑菇描述的物理特性;分類:有毒或食用 127. Musk (Version 1): The goal is to learn to predict whether new molecules will be musks or non-musks 麝香(版本1):我們的目標是要學會預測是否有新的分子,將麝香或非麝香 128. Musk (Version 2): The goal is to learn to predict whether new molecules will be musks or non-musks 麝香(第2版):我們的目標是要學會預測是否有新的分子,將麝香或非麝香 129. NSF Research Award Abstracts 1990-2003: This data set consists of (a) 129,000 abstracts describing NSF awards for basic research, (b) bag-of-word data files extracted from the abstracts, (c) a list of words used for indexing the bag-of-word NSF研究獎論文摘要1990年至2003年:(一)129000摘要描述NSF的獎項,用于基礎研究(二)字袋從抽象的數(shù)據(jù)中提取的文件,(三)為索引使用的單詞列表,該數(shù)據(jù)集組成字袋 130. Nursery: Nursery Database was derived from a hierarchical decision model originally developed to rank applications for nursery schools. 苗圃:苗圃數(shù)據(jù)庫是從最初開發(fā)托兒所排名應用分層決策模型派生。 131. Online Handwritten Assamese Characters Dataset: This is a dataset of 8235 online handwritten assamese characters. The “online” process involves capturing of data as text is written on a digitizing tablet with an electronic pen. 在線手寫阿薩姆字符數(shù)據(jù)集:這是一個8235聯(lián)機手寫阿薩姆字符的數(shù)據(jù)集。 “在線”的過程包括數(shù)據(jù)采集,數(shù)字化儀上用電子筆的書面文本。 132. Opinosis Opinion ? Review: This dataset contains sentences extracted from user reviews on a given topic. Example topics are “performance of Toyota Camry” and “sound quality of ipod nano”. Opinosis意見/評論:此數(shù)據(jù)集包含一個給定的主題從用戶評論中提取的句子。示例主題是“表現(xiàn)的豐田佳美”和“音質”的iPod nano。 133. OpinRank Review Dataset: This data set contains user reviews of cars and and hotels collected from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews). OpinRank審查數(shù)據(jù)集:該數(shù)據(jù)集包含車和酒店收集到到網(wǎng)(259000評語)和埃德蒙茲(?42230條評論)的用戶評論。 134. Optical Recognition of Handwritten Digits: Two versions of this database available; see folder 光學識別手寫體數(shù)字:這個數(shù)據(jù)庫提供的兩個版本,請參閱文件夾 135. Othello Domain Theory: Used in research to generate features for an inductive learning system 奧賽羅域理論:在研究中使用生成歸納學習系統(tǒng)的功能 136. Ozone Level Detection: Two ground ozone level data sets are included in this collection. One is the eight hour peak set (eighthr.data), the other is the one hour peak set (onehr.data). Those data were collected from 1998 to 2004 at the Houston, Galveston and Brazoria area. 臭氧濃度檢測:兩個地面臭氧濃度的數(shù)據(jù)集都包含在此集合。之一,是8個小時的高峰集(eighthr.data),另一種是一個小時的高峰集(onehr.data)。這些數(shù)據(jù)收集從1998年至2004年在休斯敦,加爾維斯頓和Brazoria區(qū)域。 137. p53 Mutants: The goal is to model mutant p53 transcriptional activity (active vs inactive) based on data extracted from biophysical simulations. p53基因突變體:我們的目標是到模型的基礎上從生物物理模擬提取數(shù)據(jù)的突變型p53的轉錄活性(有源VS無效)。 138. Page Blocks Classification: The problem consists of classifying all the blocks of the page layout of a document that has been detected by a segmentation process. 頁塊分類:問題進行分類的一個已被分割過程中檢測到的文件的頁面布局的所有塊組成。 139. Parkinsons: Oxford Parkinson's Disease Detection Dataset 帕金森:牛津帕金森氏病的檢測數(shù)據(jù)集 140. Parkinsons Telemonitoring: Oxford Parkinson's Disease Telemonitoring Dataset 帕金森遠程監(jiān)護:牛津帕金森病的遠程監(jiān)護數(shù)據(jù)集 141. PEMS-SF: 15 months worth of daily data (440 daily records) that describes the occupancy rate, between 0 and 1, of different car lanes of the San Francisco bay area freeways across time. PEMS - SF:15個月,每天的數(shù)據(jù)(440每日記錄)描述的入住率,0和1之間,不同的汽車車道,舊金山灣地區(qū)的高速公路,跨越時間的價值。 142. Pen-Based Recognition of Handwritten Digits: Digit database of 250 samples from 44 writers 基于筆的手寫數(shù)字識別:來自44個作家的250個樣本的數(shù)字數(shù)據(jù)庫 143. Pima Indians Diabetes: From National Institute of Diabetes and Digestive and Kidney Diseases; Includes cost data (donated by Peter Turney) 皮馬印第安人糖尿病:國立糖尿病,消化道和腎臟疾病研究所;包括成本數(shù)據(jù)(彼得特尼捐贈) 144. Pioneer-1 Mobile Robot Data: This dataset contains time series sensor readings of the Pioneer-1 mobile robot. The data is broken into "experiences" in which the robot takes action for some period of time and experiences a control 先鋒- 1移動機器人數(shù)據(jù):該數(shù)據(jù)集包含了時間序列的先鋒- 1移動機器人的傳感器讀數(shù)。數(shù)據(jù)分解成“經(jīng)驗”中,機器人需要一段時間的行動和經(jīng)驗的控制 145. Pittsburgh Bridges: Bridges database that has original and numeric-discretized datasets 匹茲堡橋梁:橋梁數(shù)據(jù)庫,具有原始和數(shù)值離散數(shù)據(jù)集 146. Plants: Data has been extracted from the USDA plants database. It contains all plants (species and genera) in the database and the states of USA and Canada where they occur. 植物:數(shù)據(jù)已經(jīng)從美國農(nóng)業(yè)部植物數(shù)據(jù)庫中提取。它包含在數(shù)據(jù)庫中,美國和加拿大發(fā)生的所有植物(種屬)。 147. Poker Hand: Purpose is to predict poker hands 牌手:目的是預測撲克牌 148. Post-Operative Patient: Dataset of patient features 手術后的病人:病人的特征數(shù)據(jù)集 149. Primary Tumor: From Ljubljana Oncology Institute 原發(fā)腫瘤:腫瘤研究所從盧布爾雅那 150. Prodigy: Assorted domains like blocksworld, eightpuzzle, and schedworld. 奇才:blocksworld,eightpuzzle,schedworld什錦域。 151. Protein Data: Undocumented 蛋白質數(shù)據(jù):無證 152. Pseudo Periodic Synthetic Time Series: This data set is designed for testing indexing schemes in time series databases. The data appears highly periodic, but never exactly repeats itself. 偽定期的合成時間系列:該數(shù)據(jù)集是測試時間序列數(shù)據(jù)庫中的索引計劃的設計。的數(shù)據(jù)顯示高度周期性的,但永遠不會完全重演。 153. PubChem Bioassay Data: These highly imbalanced bioassay datasets are from the differing types of screening that can be performed using HTS technology. 21 datasets were created from 12 bioassays. PubChem數(shù)據(jù)庫生物測定數(shù)據(jù):這些高度不平衡的生物測定數(shù)據(jù)集的篩選不同類型可以使用高溫超導技術。 21數(shù)據(jù)集創(chuàng)建了來自12個生物測定。 154. Quadruped Mammals: The file animals.c is a data generator of structured instances representing quadruped animals 四足哺乳動物:該文件animals.c是一個代表四足動物的結構實例的數(shù)據(jù)發(fā)生器 155. Qualitative Structure Activity Relationships: Two sets of datasets are given: pyrimidines and triazines 定性結構活性關系:給出兩套數(shù)據(jù)集:嘧啶和三嗪 156. Record Linkage Comparison Patterns: Element-wise comparison of records with personal data from a record linkage setting. The task is to decide from a comparison pattern whether the underlying records belong to one person. 記錄鏈接比較模式:元素比較明智的,從創(chuàng)紀錄的聯(lián)動設置的個人資料記錄。任務是從一個比較模式,決定是否屬于一個人的基本紀錄。 157. Relative location of CT slices on axial axis: The dataset consists of 384 features extracted from CT images. The class variable is numeric and denotes the relative location of the CT slice on the axial axis of the human body.? CT片的軸向軸的相對位置:數(shù)據(jù)集包括從CT圖像中提取的384功能。類變量是數(shù)值表示的CT片對人體的軸向軸的相對位置。 158. Reuters Transcribed Subset: This dataset is created by reading out 200 files from the 10 largest Reuters classes and using an Automatic Speech Recognition system to create corresponding transcriptions. 路透社轉錄子集:創(chuàng)建該數(shù)據(jù)集是通過讀出最大路透社從10類200個文件,并使用自動語音識別系統(tǒng),建立相應的改編。 159. Reuters-21578 Text Categorization Collection: This is a collection of documents that appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories. 路透- 21578文本分類收集:這是出現(xiàn)于1987年,路透通訊社的文件的集合。組裝和類別索引文件。 160. Robot Execution Failures: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque samples collected at regular time intervals 機器人執(zhí)行失敗:此數(shù)據(jù)集包含后故障檢測機器人的力和力矩測量。每次失敗的特點是在固定的時間間隔采集的樣品15力/力矩 161. SECOM: Data from a semi-conductor manufacturing process 世強:從半導體制造過程中的數(shù)據(jù) 162. Semeion Handwritten Digit: 1593 handwritten digits from around 80 persons were scanned, stretched in a rectangular box 16x16 in a gray scale of 256 values. Semeion手寫體數(shù)字:1593從80人左右的手寫數(shù)字進行掃描,伸一個矩形框,在256個值的灰度的16x16。 163. Servo: Data was from a simulation of a servo system 伺服:數(shù)據(jù)從一個伺服系統(tǒng)的仿真 164. Shuttle Landing Control: Tiny database; all nominal values 航天飛機著陸控制:微型數(shù)據(jù)庫;所有標稱值 165. Solar Flare: Each class attribute counts the number of solar flares of a certain class that occur in a 24 hour period 太陽耀斑:每個類的屬性一定的階級,在24小時內(nèi)發(fā)生的太陽耀斑的數(shù)量進行計數(shù) 166. Soybean (Large): Michalski's famous soybean disease database 大豆(大):MICHALSKI著名的大豆疾病數(shù)據(jù)庫 167. Soybean (Small): Michalski's famous soybean disease database 大豆(小):MICHALSKI著名的大豆疾病數(shù)據(jù)庫 168. Spambase: Classifying Email as Spam or Non-Spam Spambase:歸類為“垃圾郵件”或“非垃圾郵件的電子郵件 169. SPECT Heart: Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal. SPECT的心臟:心臟單個質子發(fā)射計算機斷層顯像(SPECT)的圖像數(shù)據(jù)。每個病人分為兩類:正常和不正常的。 170. SPECTF Heart: Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal. SPECTF心臟:心臟單個質子發(fā)射計算機斷層顯像(SPECT)的圖像數(shù)據(jù)。每個病人分為兩類:正常和不正常的。 171. Spoken Arabic Digit: This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 male and 44 female native Arabic speakers. 口語阿拉伯語位:該數(shù)據(jù)集包含MEL頻率倒譜系數(shù)(MFCCs)講阿拉伯語數(shù)字對應的時間序列。包括44男44女的母語講阿拉伯語的數(shù)據(jù)。 172. Sponge: Data on sponges; Attributes in Spanish 海綿:海綿上的數(shù)據(jù),在西班牙語中的屬性 173. Statlog (Australian Credit Approval): This file concerns credit card applications. This database exists elsewhere in the repository (Credit Screening Database) in a slightly different form Statlog(澳大利亞授信審批):這個文件是關于信用卡申請。該數(shù)據(jù)庫存在于其他地方略有不同形式的資源庫(授信數(shù)據(jù)庫) 174. Statlog (German Credit Data): This dataset classifies people described by a set of attributes as good or bad credit risks. Comes in two formats (one all numeric). Also comes with a cost matrix Statlog(德國信用數(shù)據(jù)):這個數(shù)據(jù)集劃分好壞信貸風險的屬性所描述的人。來自于兩種格式(所有數(shù)字)。還帶有一個成本矩陣 175. Statlog (Heart): This dataset is a heart disease database similar to a database already present in the repository (Heart Disease databases) but in a slightly different form Statlog(心):這個數(shù)據(jù)集是一個心臟疾病數(shù)據(jù)庫,數(shù)據(jù)庫已經(jīng)在庫(心臟病數(shù)據(jù)庫)類似,但略有不同的形式 176. Statlog (Image Segmentation): This dataset is an image segmentation database similar to a database already present in the repository (Image segmentation database) but in a slightly different form. Statlog(圖像分割):該數(shù)據(jù)集是一個圖像分割數(shù)據(jù)庫,數(shù)據(jù)庫中已存在的資源庫(圖像分割數(shù)據(jù)庫),但在一個稍微不同的的形式類似。 177. Statlog (Landsat Satellite): Multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood Statlog(地球資源衛(wèi)星多光譜):在3x3的街區(qū)在衛(wèi)星圖像的像素值,并與中央像素在每個居委會相關的分類 178. Statlog (Shuttle): The shuttle dataset contains 9 attributes all of which are numerical. Approximately 80% of the data belongs to class 1 Statlog(班車):穿梭集包含20個屬性,所有這一切都是數(shù)字。大約80%的數(shù)據(jù)屬于1級 179. Statlog (Vehicle Silhouettes): 3D objects within a 2D image by application of an ensemble of shape feature extractors to the 2D silhouettes of the objects. Statlog(車剪影):在一個物體的二維輪廓的形狀特征提取的合奏中的應用2D圖像的三維對象。 180. Statlog Project: Various Databases: Vehicle silhouttes, Landsat Sattelite, Shuttle, Australian Credit Approval, Heart Disease, Image Segmentation, German Credit Statlog項目:各種數(shù)據(jù)庫:車輛silhouttes,地球資源衛(wèi)星,航天飛機,澳大利亞信貸審批,心臟病,圖像分割,德國信用 181. Steel Plates Faults: A dataset of steel plates’ faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition. 鋼板缺陷:一個數(shù)據(jù)集鋼板斷裂,分為7個不同的類型。我們的目標是培養(yǎng)學習機,自動模式識別。 182. Student Loan Relational: Student Loan Relational Domain 。助學貸款的關系:助學貸款的關系域 183. Synthetic Control Chart Time Series: This data consists of synthetically generated control charts. 合成控制圖的時間序列數(shù)據(jù)的綜合生成的控制圖組成。 184. Syskill and Webert Web Page Ratings: This database contains HTML source of web pages plus the ratings of a single user on these web pages. Web pages are on four seperate subjects (Bands- recording artists; Goats; Sheep; and BioMedical) Syskill和Webert網(wǎng)頁評價:該數(shù)據(jù)庫包含網(wǎng)頁的HTML源代碼再加上這些網(wǎng)頁上的一個單用戶的收視率。網(wǎng)頁是在四個不同科目(樂隊的錄音藝術家;山羊;綿羊;和生物醫(yī)學) 185. Teaching Assistant Evaluation: The data consist of evaluations of teaching performance; scores are "low", "medium", or "high" 助教評價:數(shù)據(jù)包括教學績效評價;分數(shù)“低”,“中等”,或“高” 186. Thyroid Disease: 10 separate databases from Garavan Institute 甲狀腺疾病:10個單獨的數(shù)據(jù)庫Garavan研究所 187. Tic-Tac-Toe Endgame: Binary classification task on possible configurations of tic-tac-toe game 井字腳趾殘局:可能的配置的tic - tac - toe游戲的二元分類任務 188. Trains: 2 data formats (structured, one-instance-per-line) 火車:2數(shù)據(jù)格式(結構化,每行一個實例) 189. Twenty Newsgroups: This data set consists of 20000 messages taken from 20 newsgroups. 第二十新聞組:該數(shù)據(jù)集由來自20個新聞組采取的20000消息。 190. UJI Pen Characters: Data consists of written characters in a UNIPEN-like format 宇治筆特點:數(shù)據(jù)包括在UNIPEN樣的格式寫入的字符 191. UJI Pen Characters (Version 2): A pen-based database with more than 11k isolated handwritten characters 宇治鋼筆字(第2版):一個孤立的手寫字符超過11K的鋼筆型數(shù)據(jù)庫 192. Undocumented: Various datasets without documentation (feel free to explore!) 無證:沒有證件的各種數(shù)據(jù)集(自由探索!) 193. University: Data in original (LISP-readable) form 大學:原(Lisp的可讀形式)中的數(shù)據(jù) 194. UNIX User Data: This file contains 9 sets of sanitized user data drawn from the command histories of 8 UNIX computer users at Purdue over the course of up to 2 years. UNIX用戶數(shù)據(jù):該文件包含9套消毒的用戶在長達2年的,當然從8 UNIX計算機用戶的命令歷史數(shù)據(jù)繪制在普渡大學。 195. URL Reputation: Anonymized 120-day subset of the ICML-09 URL data containing 2.4 million examples and 3.2 million features. URL的信譽:不具名的120天的ICML - 09的URL數(shù)據(jù),含有240萬的例子和320萬功能的一個子集。 196. US Census Data (1990): The USCensus1990raw data set contains a one percent sample of the Public Use Microdata Samples (PUMS) person records drawn from the full 1990 census sample. 美國人口普查數(shù)據(jù)(1990年):USCensus1990raw數(shù)據(jù)集包含一成市民使用微觀數(shù)據(jù)(PUMS)人記錄完整的1990年人口普查抽樣抽樣樣品。 197. Volcanoes on Venus - JARtool experiment: The JARtool project was a pioneering effort to develop an automatic system for cataloging small volcanoes in the large set of Venus images returned by the Magellan spacecraft. 金星上的火山 - JARtool實驗:JARtool項目是一項開創(chuàng)性的努力開發(fā)一個自動化系統(tǒng)編目在大麥哲倫飛船返回的金星圖像設置的小火山。 198. Wall-Following Robot Navigation Data: The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around its 'waist'. 以下壁掛式機器人的導航數(shù)據(jù):數(shù)據(jù)收集的SCITOS G5機器人的導航,通過房間下面的墻壁以順時針方向,4輪,使用圓周圍的“腰”,安排了24超聲傳感器。 199. Water Treatment Plant: Multiple classes predict plant state 水處理廠:多類預測植物狀態(tài) 200. Waveform Database Generator (Version 1): CART book's waveform domains 波形數(shù)據(jù)庫生成器(版本1):訂購書的波形域 201. Waveform Database Generator (Version 2): CART book's waveform domains 波形數(shù)據(jù)庫生成(第2版):訂購書的波形域 202. Wine: Using chemical analysis determine the origin of wines 葡萄酒:使用化學分析器判定葡萄酒的來源。 203. Wine Quality: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/). 葡萄酒的質量:包括兩個數(shù)據(jù)集,與來自葡萄牙北部的紅與白葡萄酒樣本樣品相關。目標是通過物理化學檢驗,設計出葡萄酒的質量模型。 204. YearPredictionMSD: Prediction of the release year of a song from audio features. Songs are mostly western, commercial tracks ranging from 1922 to 2011, with a peak in the year 2000s. 年度預測MSD:從聲音的特征里,預測一首歌曲的發(fā)行年份、歌曲大部來自西部的、從1922至2011年的商業(yè)性的音軌,在2000年到達頂峰。 205. Yeast: Predicting the Cellular Localization Sites of Proteins 酵母DataSet:預測蛋白質的細胞定位點。 206. Zoo: Artificial, 7 classes of animals 動物園DataSet:人工,其中類別的動物。 創(chuàng)作不易,轉載請注明出處:https://blog.csdn.net/mago2015 |
總結
以上是生活随笔為你收集整理的UCI数据集汇总及描述的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: C# 开发Chrome内核浏览器(Web
- 下一篇: java实现控件绑定数据源_控件(三)—