當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

uci数据集汇总及翻译

發布時間：2023/12/20 编程问答 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 uci数据集汇总及翻译小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

uci數據集匯總及翻譯

不知道問什么很多人在后臺詢問uci數據集的下載，但是我好像沒有在哪里說過可以在我這里下載的，但是有很多人要，所以這里就做一個搬運。在微信公眾號后臺回復 uci數據集即可獲得打包的uci數據集，或者從下面這個鏈接，自己找自己感興趣的數據集下載：

http://archive.ics.uci.edu/ml/index.php

歡迎大家關注我的微信公眾號，未來上面會推送python 機器學習算法學習深度學習論文閱讀以及偶爾的小雞湯等內容。ようこそいらっしゃい！

搜索 coderwangson 關注

1.Abalone : Predict the age of abalone from physical measurements

鮑魚 DataSet ：根據物理度量，預測鮑魚的年齡。

2.Abscisic Acid Signaling Network : The objective is to determine the set of boolean rules that describe the interactions of the nodes within this plant signaling network. The dataset includes 300 separate boolean pseudodynamic simulations using an asynchronous update scheme.

目標是測定布爾值的度量集合，以描述植物的信號網路節點。該數據集包括了
300 個獨立的布爾值形式的虛擬動態模擬值，使用了異步更新的架構。

3.Acute Inflammations : The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system.

急性炎癥 DataSet ：數據來源于一位醫學專家的數據集，用以檢測專家系統，可以推斷出泌尿系統的兩種疾病的診斷結果。

4.Adult : Predict whether income exceeds $50K/yr based on census data. Also known as “Census Income” dataset.

成人 DataSet ：根據戶口普查資料，預測收入是否能超過 50000 美元/年。通常也被稱為“收入普查”數據集。

5.Annealing : Steel annealing data

退火 DataSet ：訓練退火數據。

6.Anonymous Microsoft Web Data : Log of anonymous users of www.microsoft.com; predict areas of the web site a user visited based on data on other areas the user visited.

匿名微軟網絡數據：微軟網站的匿名用戶記錄；通過其他的用戶訪問區域數據，預測用戶在 web 站點的訪問區域。

7.Arcene : ARCENE’s task is to distinguish cancer versus normal patterns from mass-spectrometric data. This is a two-class classification problem with

continuous input variables. This dataset is one of 5 datasets of the NIPS 2003 feature selection challenge.

ArceneDataSet ：該數據集的任務是根據大量的觀測數據，從正常的模式中辨別出癌癥。這是一個根據不斷輸入的變量的二級分類問題。該數據集是從NIPS2003 特征選擇挑戰比賽中的 5 個數據集之一。

8.Arrhythmia : Distinguish between the presence and absence of cardiac arrhythmia and classify it in one of the 16 groups.

心率失常 DataSet ：分辨是否出現心率失常，并將結果分類進 16 個組之一。

9.Artificial Characters : Dataset artificially generated by using first order theory which describes structure of ten capital letters of English alphabet

人為性狀 DataSet ：通過使用第一次序理論（該理論可以描述出英語字母表的十個開頭字母的結構），自動生成的數據集。

10.Audiology (Original) : Nominal audiology dataset from Baylor

原始 AudiologyDataSet ：來自 Baylor 的標稱型的 audiology 數據集。

11.Audiology (Standardized) : Standardized version of the original audiology database

標準 AudiologyDataSet ：原始 Audiology 數據集的標準化版本。

12.Australian Sign Language signs : This data consists of sample of Auslan (Australian Sign Language) signs. Examples of 95 signs were collected from
five signers with a total of 6650 sign samples.

澳大利亞標記語言標記 DataSet ：這些數據包括了澳大利亞標記語言標記的樣本。95 個實例，均來自五個標識器，其中有 6650 個標記樣本。

13.Australian Sign Language signs (High Quality) : This data consists of sample of Auslan (Australian Sign Language) signs. 27 examples of each of
95 Auslan signs were captured from a native signer using high-quality position trackers

澳大利亞標記語言標記 DataSet 高品質版：該數據集包含了 Auslan 標記的樣本。有 27 個實例，它們來自 95 個標記，這 27 個實例是使用高質量位置追蹤器的當地標識器捕捉出來的。

14.Auto MPG : Revised from CMU StatLib library, data concerns city-cycle

fuel consumption

自動 MPGDataSet ：來自 CMU StatLib 實驗室的精品，是與城市循環能源消耗相關的數據集。

15.Automobile : From 1985 Ward’s Automotive Yearbook

汽車 DataSet ：來自 1985 的沃德自動化年鑒。

16.AutoUniv : AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of real data. Data can be generated in .csv, ARFF or C4.5 formats.

AutoUniv 是一個高級數據生成器，可以用來處理分類任務。目標是反映現實數
據的微妙與不同之處。數據可以在 .csv 中生成，采用 ARFF 或者 C4.5 的格式。

17.Bach Chorales : Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp

基于 Chorales 的時間序列數據集；可以用來挑戰生成性的語法；數據放在 Lisp
中。

18.Badges : Badges labeled with a “+” or “-” as a function of a person’s name

徽章 DataSet ：標記了“ +”或“ -”的符號的標記，可以作為一個人姓名的函數表達式。

19.Bag of Words : This data set contains five text collections in the form of bags-of-words.

詞語包 DataSet ：該數據集包含了 5 個文本集合，每個文本集合以詞語包的形式展現。

20.Balance Scale : Balance scale weight & distance database

天平 DataSet ：天平的重量和距離數據庫。

21.Balloons : Data previously used in cognitive psychology experiment; 4 data sets represent different conditions of an experiment

氣球 DataSet ：曾經用在認知心理學實驗中的數據； 4 個數據集代表了一個實驗中的不同條件。

22.Blood Transfusion Service Center : Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan – this is a classification

problem.

輸血服務中心 DataSet ：來自臺灣的 Hsin-CHu 市的輸血服務中心的數據——用以解決分類問題。

23.Breast Cancer : Breast Cancer Data (Restricted Access)

乳腺癌 DataSet ：乳腺癌數據（訪問限制）。

24.Breast Cancer Wisconsin (Diagnostic) : Diagnostic Wisconsin Breast Cancer Database

乳腺癌威斯康星洲（診斷數據） DataSet ：威斯康星的乳腺癌診斷數據。

25.Breast Cancer Wisconsin (Original) : Original Wisconsin Breast Cancer Database

乳腺癌威斯康星洲（原始數據）：原始的威斯康星州乳腺癌數據庫。

26.Breast Cancer Wisconsin (Prognostic) : Prognostic Wisconsin Breast Cancer Database

乳腺癌威斯康星洲（ Prognostic 版）：威斯康星州乳腺癌數據庫。

27.Breast Tissue : Dataset with electrical impedance measurements of freshly excised tissue samples from the breast.

乳腺組織 DataSet ：乳腺的新鮮切除組織樣本的電阻度量數據集。

28.CalIt2 Building People Counts : This data comes from the main door of the CalIt2 building at UCI.

Calt2 建筑的人數：該數據集來自 UCI 的 Calts 建筑的主要大門。

29.Car Evaluation : Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.

汽車評估 DataSet ：來源于簡單層次決策模型，該數據集可用于測試建設性的回歸，和發現結構性方法。

30.Cardiotocography : The dataset consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians.

胎兒心率 DataSet ：該數據集包括胎兒心率（ FHR ），和基于產科專家醫生分類

的 cardiotocograms 子宮收縮（ UC ）特征。

31.Census Income : Predict whether income exceeds $50K/yr based on census data. Also known as “Adult” dataset.

收入普查 DataSet ：基于普查數據，預測收入是否超過 50000 美元/年。也被稱為“成人”數據集。

32.Census-Income (KDD) : This data set contains weighted census data extracted from the 1994 and 1995 current population surveys conducted by the
U.S. Census Bureau.

收入普查（ KDD ）DataSet ：這個數據集包含了從 1994 －1995 年的 U.S 普查局的《當前人口調查》中提取出來的普查數據。

33.Challenger USA Space Shuttle O-Ring : Task: predict the number of O-rings that experience thermal distress on a flight at 31 degrees F given data on the previous 23 shuttle flights

挑戰者號 USA 航天飛機 O 形圈 DataSet ：任務：基于前 23 次飛行數據，預測在一次 31 度熱壓 F 的狀況中的飛行任務的 O 形圈的數目。

34.Character Trajectories : Multiple, labelled samples of pen tip trajectories recorded whilst writing individual characters. All samples are from the same writer, for the purposes of primitive extraction. Only characters with a single pen-down segment were considered.

字符軌跡 DataSet ：同時寫出單個字幕的筆尖軌道的多個標記樣本記錄。為了保證初始的提取數據，所有的樣本都來自于同一個書寫人員。僅僅考慮了單一落筆段的字符。

35.Chess (Domain Theories) : 6 different domain theories for generating legal moves of chess

國際象棋（域理論） DataSet ：產生國際象棋的規定路數的 6 個不同的域理論。

36.Chess (King-Rook vs. King) : Chess Endgame Database for White King and Rook against Black King (KRK).

國際象棋（王 RookVS 王） DataSet ：白國王與黑國王的象棋殘局數據庫。

37.Chess (King-Rook vs. King-Knight) : Knight Pin Chess End-Game Database Creator

國際象棋（王 Rook 對戰騎士）：騎士

38.Chess (King-Rook vs. King-Pawn) : King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7).

國王 Rook 與國王 Pawn 的 a7 （通常簡寫為 KAEPA7 ）。

39.Cloud : Little Documentation

小文檔。

40.CMU Face Images : This data consists of 640 black and white face images of people taken with varying pose (straight, left, right, up), expression (neutral, happy, sad, angry), eyes (wearing sunglasses or not), and size

CMU 人臉圖像 DataSet ：該數據集包含了 640 張黑白人臉圖像，并且有直、左、右、上四個角度，中性、高興、悲傷、生氣四個表情，有的戴著太陽鏡，有的沒
有，并且大小也不一。

41.Coil 1999 Competition Data : This data set is from the 1999 Computational Intelligence and Learning (COIL) competition. The data contains measurements of river chemical concentrations and algae densities.

Coil1999 競賽數據：該數據集來自 1999 年的計算機智能學習競賽（簡寫為 Coil ）。該數據集包含了河流的化學濃度度量和藻類的密度度量。

42.Communities and Crime : Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement
data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR.

社區與犯罪 DataSet ：美國的社區。該數據集包含了來自 1990 美國普查的社會經濟數據、來自 1990 美國 LEMAS 調查的法律實施數據，還有來自 1995 年 FBI UCR 的犯罪數據。

43.Communities and Crime Unnormalized : Communities in the US. Data combines socio-economic data from the '90 Census, law enforcement data from the 1990 Law Enforcement Management and Admin Stats survey, and crime data from the 1995 FBI UCR

社區和非標準化犯罪 DataSet ：美國的社區。數據包含了來自 90 年代普查的社會經濟數據、來自 1990 年法律實施管理調查的法律實施數據，還有來自 1995 年 FBI UCR 的犯罪數據。

44.Computer Hardware : Relative CPU Performance Data, described in

terms of its cycle time, memory size, etc.

計算機硬件：相關 CPU 運行數據，采用它的時間周期、內存大小來描述。

45.Concrete Compressive Strength : Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients.

混凝土抗壓強度 DataSet ：混凝土是土木工程中最重要的材料。抗壓強度是混凝土年齡與組成非線性特征。

46.Concrete Slump Test : Concrete is a highly complex material. The slump flow of concrete is not only determined by the water content, but that is also influenced by other concrete ingredients.

混凝土塌方度試驗：混凝土是一種非常復雜的材料。它的塌落度流量不僅取決于含水量，也受其他具體成分的影響。

47.Congressional Voting Records : 1984 United Stated Congressional Voting Records; Classify as Republican or Democrat

國會投票記錄 DataSet ：1984 年美國國會投票記錄；按照共和黨與民主黨分類。

48.Connect-4 : Contains connect-4 positions

連接 4：包含了連接 4 的位置。

49.Connectionist Bench (Nettalk Corpus) : The file “nettalk.data” contains a list of 20,008 English words, along with a phonetic transcription for each word.
The task is to train a network to produce the proper phonemes

連接工作臺（ Nettalk 資料庫）：文件“ nettalk.data ”包含了一個有 20008 個英語單詞的列表，還有一個每個單詞的 phonetic 副本。任務是訓練一個網絡，用來產生適當的 phonemes 。

50.Connectionist Bench (Sonar, Mines vs. Rocks) : The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock.

連接工作臺（聲納、礦產和巖石）：目標是訓練一個網絡，用來區別在金屬圓柱體的反彈聲納信號，和在基本為圓柱體的巖石上的反彈信號。

51.Connectionist Bench (Vowel Recognition - Deterding Data) : Speaker independent recognition of the eleven steady state vowels of British English

using a specified training set of lpc derived log area ratios.

連接工作臺（元音識別— Detering 數據）：使用一個來源于一個比率的指定訓練集的 11 個英式英語的穩定元音字母的獨立識別揚聲器。

52.Contraceptive Method Choice : Dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey.

避孕方法的選擇：該數據集是 1997 年印度尼西亞全國的避孕患病率調查的的一個子集。

53.Corel Image Features : This dataset contains image features extracted from a Corel image collection. Four sets of features are available based on the color histogram, color histogram layout, color moments, and co-occurrence

Corel 圖像特征：該數據集包含了提取自一個 Corel 圖像集合的圖片特征。基于顏色直方圖、顏色直方圖布局、顏色的時機和調和，可得到四個特征集合。

54.Covertype : Forest CoverType dataset

覆蓋類型：森林覆蓋類型數據集。

55.Credit Approval : This data concerns credit card applications; good mix of attributes

信貸審批：該數據集與信用卡的使用相關；是各種屬性的集合。

56.Cylinder Bands : Used in decision tree induction for mitigating process delays known as “cylinder bands” in rotogravure printing

氣缸帶：使用判定樹來歸納，減緩氣缸帶的凸版打印。

57.Demospongiae : Marine sponges of the Demospongiae class classification domain.

Demospongiae 類別下的海綿分類域。

58.Dermatology : Aim for this dataset is to determine the type of Eryhemato-Squamous Disease.

皮膚科：該數據集用于判定 Eryhemato 鱗狀疾病的類型。

59.Dexter : DEXTER is a text classification problem in a bag-of-word representation. This is a two-class classification problem with sparse continuous input variables. This dataset is one of five datasets of the NIPS

2003 feature selection challenge.

DETEX 是一個用一個文字包來表現的文本分類問題。這是一個通過不斷的輸入參數的兩層的分類問題。該數據集是 NIPS2003 年特征提取邀請賽的五個數據集中的一個。

60.DGP2 - The Second Data Generation Program : Generates application domains based on specific parameters, number of features, and proportion of positive to negative examples

DGP2 —第二個數據生成程序：基于具體的參數、特征的數量、和正面到負面例子的比率，產生應用域。

61.Diabetes : This diabetes dataset is from AIM '94

糖尿病：該糖尿病數據集來自 AIM94 。

62.Document Understanding : Five concepts, expressed as predicates, to be learned

文件理解：要學習的五個概念，作為謂詞來表現。

63.Dodgers Loop Sensor : Loop sensor data was collected for the Glendale on ramp for the 101 North freeway in Los Angeles

Dodgers 回路傳感器：回路傳感器數據集來自 Gledale 的斜坡（在洛杉磯的 101
個北高速公路）。

64.Dorothea : DOROTHEA is a drug discovery dataset. Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the NIPS 2003 feature selection challenge.

Dorothea 是一個藥物發現數據集。以結構分析特征來表現的化合物必須分類為活性的（綁定到凝血酶）或者非活性的。這是五個 NIPS2003 特征選擇挑戰賽數據集中的一個。

65.E. Coli Genes : Data giving characteristics of each ORF (potential gene) in the E. coli genome. Sequence, homology (similarity to other genes) and structural information, and function (if known) are provided.

大腸桿菌基因：每個在 E.coli 基因組里面 ORD( 潛在基因 )的特征數據集。提供序列、同源性（與其他基因的相似形）和結構信息。還有功能（如果知道的話）。

66.EBL Domain Theories : Assorted small-scale domain theories

EBL 域理論：各種小規模的域理論。

67.Echocardiogram : Data for classifying if patients will survive for at least one year after a heart attack

超聲心動圖：該數據集用來分類是否病人在一次心臟病后，至少可以存活一年。

68.Ecoli : This data contains protein localization sites

該數據集包含了蛋白質本地化地址。

69.Economic Sanctions : Domain Theory on Economic Sanctions;
Undocumented

經濟制裁：經濟制裁方面的域理論，無記錄文檔。

70.EEG Database : This data arises from a large study to examine EEG correlates of genetic predisposition to alcoholism. It contains measurements from 64 electrodes placed on the scalp sampled at 256 Hz

EEG 數據庫：該數據集來源于一個檢查 EEG 的、與易患酒精中毒的基因體質相關的大型研究、包含了放在頭皮上的、為 256HZ 的、來自 64 個電極的度量。

71.El Nino : The data set contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific.

厄爾尼諾：該數據集包含了從整個赤道太平洋的一系列浮標的海洋與地面氣象讀數。

72.Entree Chicago Recommendation Data : This data contains a record of user interactions with the Entree Chicago restaurant recommendation system.

芝加哥主菜推薦數據：該數據集包含了一個與芝加哥主菜館的推薦系統的用戶交互的記錄。

73.Flags : From Collins Gem Guide to Flags, 1986

標志：從柯林斯寶石指南的標志， 1986

74.Forest Fires : This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data (see details at:

http://www.dsi.uminho.pt/~pcortez/forestfires ).

森林火災：這是一個艱難的回歸的任務，其目的是在葡萄牙東北部地區，利用氣象數據和其他數據，預測森林火災的過火面積，（詳見： http://www.dsi.uminho PT / pcortez / forestfires ）。

75.Function Finding : Cases collected mostly from investigations in physical science; intention is to evaluate function-finding algorithms

尋找功能：收集的情況下，大多是從在物理科學的調查 ;意圖是評價函數發現算法

76.Gisette : GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusible digits ‘4’ and ‘9’. This dataset is one of five datasets of the NIPS 2003 feature selection challenge.

Gisette： GISETTE 是一個手寫數字識別問題。問題是獨立的高度 confusible 數字’4’和’9’。這個數據集是 5 NIPS 的 2003 年特征選擇挑戰的數據集之一。

77.Glass Identification : From USA Forensic Science Service; 6 types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc)

玻璃鑒定：從美國法醫科學服務 ; 6 種玻璃 ;在他們的氧化物含量定義（即鈉，鐵，鉀等）

78.Haberman’s Survival : Dataset contains cases from study conducted on the survival of patients who had undergone surgery for breast cancer

哈伯曼的生存： DataSet 包含誰經歷了乳腺癌手術患者的生存所進行的研究情況

79.Hayes-Roth : Topic: human subjects study

海斯 - 羅斯：主題：人類受試者的研究

80.Heart Disease : 4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach

心臟病： 4 個數據庫：克利夫蘭，匈牙利，瑞士，和弗吉尼亞州的長灘

81.Hepatitis : From G.Gong: CMU; Mostly Boolean or numeric-valued attribute types; Includes cost data (donated by Peter Turney)

肝炎：從 G.龔：債務工具中央結算系統 ;大多是布爾值或數字值的屬性類型，包括成本數據
（彼得特尼捐贈）

82.Hill-Valley : Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y co-ordinate, the
points will create either a Hill (a ? bump ? in the terrain) or a Valley (a ? dip? in

the terrain).

希爾谷：每個記錄代表一個二維圖形上 100 點。當策劃，以統籌的 Y （從 1 到 100），積分將創建一個山（在凹凸的地形）或谷（浸在地形）。

83.Horse Colic : Well documented attributes; 368 instances with 28 attributes (continuous, discrete, and nominal); 30% missing values

馬絞痛：有據可查的屬性 ; 368 28 屬性（連續，離散的，標稱值）的實例 ; 30％的缺失值

84.Housing : Taken from StatLib library

房屋：兩者 StatLib 庫

85.ICU : Data set prepared for the use of participants for the 1994 AAAI Spring Symposium on Artificial Intelligence in Medicine.

ICU 的數據集，為 1994 年 AAAI 春季研討會的與會者在醫學上使用人工智能準備。

86.Image Segmentation : Image data described by high-level numeric-valued attributes, 7 classes

圖像分割：由高層次的數字值屬性描述的圖像數據， 7 類

87.Insurance Company Benchmark (COIL 2000) : This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company. The data consists of 86 variables and includes product usage data and socio-demographic data

保險公司的基準（線圈 2000 年）：使用該數據集在線圈 2000 挑戰包含保險公司對客戶的信息。該數據由 86 變數，包括產品使用的數據和社會人口數據

88.Internet Advertisements : This dataset represents a set of possible advertisements on Internet pages.

互聯網廣告：這個 DataSet 表示一組可能在互聯網上的網頁廣告。

89.Internet Usage Data : This data contains general demographic information on internet users in 1997.

互聯網應用的數據：該數據包含一般的互聯網用戶在 1997 年的人口統計信息。

90.Ionosphere : Classification of radar returns from the ionosphere

電離層：從電離層雷達回波分類

91.IPUMS Census Database : This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990.

IPUMS 普查數據庫：該數據集包含未加權 PUMS 普查從洛杉磯和長灘地區 1970 年， 1980
年和 1990 年的數據。

92.Iris : Famous database; from Fisher, 1936

光圈：著名的數據庫 ;從 1936 年費舍爾，

93.ISOLET : Goal: Predict which letter-name was spoken–a simple classification task.

ISOLET ：目標：預測字母名稱是口語 - 一個簡單的分類任務。

94.Japanese Credit Screening : Includes domain theory (generated by talking to Japanese domain experts); data in Lisp

日本信用篩選：包括域理論（日本領域的專家交談生成） ;在 Lisp 中的數據

95.Japanese Vowels : This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers.

日本元音：該數據集的記錄 640 12 的 LPC 倒譜系系數從九男揚聲器的時間序列。

96.KDD Cup 1998 Data : This is the data set used for The Second International Knowledge Discovery and Data Mining Tools Competition, which was held i n conjunction with KDD-98

KDD 杯 1998 年的數據：這是數據集的第二屆國際知識發現和數據挖掘工具的競爭，這是在同時舉行的 KDD - 98

97.KDD Cup 1999 Data : This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99

KDD 杯 1999 年的數據：這是數據集使用的第三次國際知識發現和數據挖掘工具的競爭，這是在同時舉行的 KDD - 99

98.Kinship : Relational dataset

親屬關系：關系數據集

99.Labor Relations : From Collective Bargaining Review

勞動關系：從集體談判檢討

100.LED Display Domain : From Classification and Regression Trees book; We provide here 2 C programs for generating sample databases

LED 顯示域：從分類和回歸樹書，我們在這里提供 2 C 程序生成示例數據庫

101.Lenses : Database for fitting contact lenses

鏡頭：裝修隱形眼鏡數據庫

102.Letter Recognition : Database of character image features; try to identify the letter

信承認：人物形象特征的數據庫 ; 試圖找出信

103.Libras Movement : The data set contains 15 classes of 24 instances each. Each class references to a hand movement type in LIBRAS (Portuguese name ‘L ? ngua BRAsileira de Sinais’, oficial brazilian signal language).

天秤座的運動：該數據集包含了 15 類 24 個實例。每個類的引用，在天秤座的人的手部動作類型（葡萄牙名“ Lngua BRAsileira Sinais ”，公報巴西信號語言）。

104.Liver Disorders : BUPA Medical Research Ltd. database donated by Richard S. Forsyth

肝臟疾病：保柏醫療研究公司數據庫由理查德福塞斯捐贈

105.Localization Data for Person Activity : Data contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing the same scenario five times.

人活動的本地化數據：數據包含五個執行不同的活動的人的錄音。每個人穿的 4 個傳感器（標簽），同時執行相同的情況下的五倍。

106.Logic Theorist : All code for Logic Theorist

邏輯理論家：邏輯理論家的所有代碼

107.Low Resolution Spectrometer : From IRAS data – NASA Ames Research Center

低分辨率光譜儀：從紅外天文衛星數據 - 美國國家航空航天局艾姆斯研究中心

108.Lung Cancer : Lung cancer data; no attribute definitions

肺癌：肺癌數據 ;沒有屬性定義

109.Lymphography : This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. (Restricted access)

淋巴造影：從大學醫學中心，腫瘤研究所，南斯拉夫盧布爾雅那的這淋巴域。（限制訪問）

110.M. Tuberculosis Genes : Data giving characteristics of each ORF (potential gene) in the M. tuberculosis bacterium. Sequence, homology (similarity to other genes) and structural information, and function (if known) are provided

結核分枝桿菌基因：給每個 ORF 在結核分枝桿菌的細菌特性（潛在的基因）的數據。序列，同源性（其他基因的相似性）和結構信息，和功能（如果已知）

111.Madelon : MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is that the problem is multivariate and highly non-linear.

Madelon ：MADELON 是一個人造的數據集，這是對 2003 年的 NIPS 的特征選擇挑戰的一部分。這是一個連續的輸入變量的兩個類的分類問題。困難的是，問題是多元的和高度非線性。

112.MAGIC Gamma Telescope : Data are MC generated to simulate registration of high energy gamma particles in an atmospheric Cherenkov telescope

魔伽馬望遠鏡：數據生成高能量的伽瑪粒子來模擬大氣切倫科夫望遠鏡登記 MC

113.Mammographic Mass : Discrimination of benign and malignant mammographic masses based on BI-RADS attributes and the patient’s age.

乳腺質量：良性和惡性乳腺群眾基于 BI - RADS 的屬性和病人的年齡歧視。

114.Mechanical Analysis : Fault diagnosis problem of electromechanical devices; also PUMPS DATA SET is newer version with domain theory and results

力學分析：機電設備的故障診斷問題 ;水泵數據集與域的理論和成果是較新的版本

115.Meta-data : Meta-Data was used in order to give advice about which classification method is appropriate for a particular dataset (taken from results of Statlog project).

元數據：元數據使用的分類方法是適合于一個特定的數據集（ Statlog 項目的結果），以提供意見。

116.MiniBooNE particle identification : This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background).

MiniBooNE 的粒子鑒別：該數據集是從 MiniBooNE 的實驗是使用電子中微子（信號），以區別于 μ 子中微子（背景）。

117.Mobile Robots : Learning concepts from sensor data of a mobile robot; set of data sets

移動機器人：從移動機器人的傳感器數據學習觀念 ;組數據集

118.Molecular Biology (Promoter Gene Sequences) : E. Coli promoter gene sequences (DNA) with partial domain theory

分子生物學（啟動子序列）：大腸桿菌啟動子的基因序列（ DNA ）的部分域理論

119.Molecular Biology (Protein Secondary Structure) : From CMU connectionist bench repository; Classifies secondary structure of certain globular proteins

分子生物學（蛋白質二級結構）：從債務工具中央結算系統聯結板凳資源庫 ;某些球狀蛋白質的二級結構進行分類

120.Molecular Biology (Splice-junction Gene Sequences) : Primate splice-junction gene sequences (DNA) with associated imperfect domain theory

分子生物學（拼接交界的基因序列）：靈長類動物的基因序列拼接結與相關的不完善域理論
（脫氧核糖核酸）

121.MONK’s Problems : A set of three artificial domains over the same attribute space; Used to test a wide range of induction algorithms

和尚的問題：三個以上相同的屬性空間的人工域 ;用于測試一個廣泛的歸納算法

122.Moral Reasoner : Horn-clause model that qualitatively simulates moral reasoning; Theory includes negated literals

道德推理：霍恩子句模型定性模擬道德推理理論包括否定的文字

123.Movie : This data set contains a list of over 10000 films including many

older, odd, and cult films. There is information on actors, casts, directors, producers, studios, etc.

電影：該數據集包含一個 10000 多部電影，包括許多年紀大了，奇怪，和邪教的電影列表。有上的演員，演員，董事，制片人，制片公司等信息

124.MSNBC.com Anonymous Web Data : This data describes the page visits of users who visited msnbc.com on September 28, 1999. Visits are recorded at the level of URL category (see description) and are recorded in time order.

MSNBC.com 匿名 Web 數據：這個數據描述了用戶的頁面訪問參觀， 1999 年 9 月 28 日
msnbc.com。記錄訪問的 URL 類別的水平（見說明），在時間順序記錄。

125.Multiple Features : This dataset consists of features of handwritten numer als (0'--9’) extracted from a collection of Dutch utility maps

多種功能：這個數據集，包括從荷蘭實用地圖的集合中提取的手寫體數字（ 0'結束 -9 “）功能

126.Mushroom : From Audobon Society Field Guide; mushrooms described in terms of physical characteristics; classification: poisonous or edible

蘑菇：從 Audobon 社會領域指南“ ;蘑菇描述的物理特性 ;分類：有毒或食用

127.Musk (Version 1) : The goal is to learn to predict whether new molecules will be musks or non-musks

麝香（版本 1）：我們的目標是要學會預測是否有新的分子，將麝香或非麝香

128.Musk (Version 2) : The goal is to learn to predict whether new molecules will be musks or non-musks

麝香（第 2 版）：我們的目標是要學會預測是否有新的分子，將麝香或非麝香

129.NSF Research Award Abstracts 1990-2003 : This data set consists of
(a) 129,000 abstracts describing NSF awards for basic research, (b)
bag-of-word data files extracted from the abstracts, ? a list of words used for indexing the bag-of-word

NSF 研究獎論文摘要 1990 年至 2003 年：（一） 129000 摘要描述 NSF 的獎項，用于基礎研究（二）字袋從抽象的數據中提取的文件，（三）為索引使用的單詞列表，該數據集組成字袋

130.Nursery : Nursery Database was derived from a hierarchical decision

model originally developed to rank applications for nursery schools.

苗圃：苗圃數據庫是從最初開發托兒所排名應用分層決策模型派生。

131.Online Handwritten Assamese Characters Dataset : This is a dataset
of 8235 online handwritten assamese characters. The “ online ” process involves capturing of data as text is written on a digitizing tablet with an
electronic pen.

在線手寫阿薩姆字符數據集：這是一個 8235 聯機手寫阿薩姆字符的數據集。 “在線”的過程包括數據采集，數字化儀上用電子筆的書面文本。

132.Opi nosis Opinion ? Review : This dataset contains sentences extracted
from user reviews on a given topic. Example topics are “ performance of Toyota Camry” and “ sound quality of ipod nano ”.

Opinosis 意見/評論：此數據集包含一個給定的主題從用戶評論中提取的句子。示例主題是“表現的豐田佳美”和“音質”的 iPod nano。

133.OpinRank Review Dataset : This data set contains user reviews of cars and and hotels collected from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews).

OpinRank 審查數據集：該數據集包含車和酒店收集到到網（ 259000 評語）和埃德蒙茲（?42230 條評論）的用戶評論。

134.Optical Recognition of Handwritten Digits : Two versions of this database available; see folder

光學識別手寫體數字：這個數據庫提供的兩個版本，請參閱文件夾

135.Othello Domain Theory : Used in research to generate features for an inductive learning system

奧賽羅域理論：在研究中使用生成歸納學習系統的功能

136.Ozone Level Detection : Two ground ozone level data sets are included in this collection. One is the eight hour peak set (eighthr.data), the other is the one hour peak set (onehr.data). Those data were collected from 1998 to 2004 at the Houston, Galveston and Brazoria area.

臭氧濃度檢測：兩個地面臭氧濃度的數據集都包含在此集合。之一，是 8 個小時的高峰集
（eighthr.data ），另一種是一個小時的高峰集（ onehr.data）。這些數據收集從 1998 年至 2004
年在休斯敦，加爾維斯頓和 Brazoria 區域。

137.p53 Mutants : The goal is to model mutant p53 transcriptional activity (active vs inactive) based on data extracted from biophysical simulations.

p53 基因突變體：我們的目標是到模型的基礎上從生物物理模擬提取數據的突變型 p53 的轉錄活性（有源 VS 無效）。

138.Page Blocks Classification : The problem consists of classifying all the blocks of the page layout of a document that has been detected by a segmentation process.

頁塊分類：問題進行分類的一個已被分割過程中檢測到的文件的頁面布局的所有塊組成。

139.Parkinsons : Oxford Parkinson’s Disease Detection Dataset

帕金森：牛津帕金森氏病的檢測數據集
140.Parkinsons Telemonitoring : Oxford Parkinson’s Disease Telemonitoring Dataset
帕金森遠程監護：牛津帕金森病的遠程監護數據集

141.PEMS-SF : 15 months worth of daily data (440 daily records) that describes the occupancy rate, between 0 and 1, of different car lanes of the San Francisco bay area freeways across time.

PEMS - SF： 15 個月，每天的數據（ 440 每日記錄）描述的入住率， 0 和 1 之間，不同的汽車車道，舊金山灣地區的高速公路，跨越時間的價值。

142.Pen-Based Recognition of Handwritten Digits : Digit database of 250 samples from 44 writers

基于筆的手寫數字識別：來自 44 個作家的 250 個樣本的數字數據庫

143.Pima Indians Diabetes : From National Institute of Diabetes and Digestive and Kidney Diseases; Includes cost data (donated by Peter Turney)

皮馬印第安人糖尿病：國立糖尿病，消化道和腎臟疾病研究所 ;包括成本數據（彼得特尼捐贈）

144.Pioneer-1 Mobile Robot Data : This dataset contains time series sensor readings of the Pioneer-1 mobile robot. The data is broken into “experiences”
in which the robot takes action for some period of time and experiences a control

先鋒 - 1 移動機器人數據：該數據集包含了時間序列的先鋒 - 1 移動機器人的傳感器讀數。數據分解成“經驗”中，機器人需要一段時間的行動和經驗的控制

145.Pittsburgh Bridges : Bridges database that has original and numeric-discretized datasets

匹茲堡橋梁：橋梁數據庫，具有原始和數值離散數據集

146.Plants : Data has been extracted from the USDA plants database. It contains all plants (species and genera) in the database and the states of USA and Canada where they occur.

植物：數據已經從美國農業部植物數據庫中提取。它包含在數據庫中，美國和加拿大發生的所有植物（種屬）。

147.Poker Hand : Purpose is to predict poker hands

牌手：目的是預測撲克牌

148.Post-Operative Patient : Dataset of patient features

手術后的病人：病人的特征數據集

149.Primary Tumor : From Ljubljana Oncology Institute

原發腫瘤：腫瘤研究所從盧布爾雅那

150.Prodigy : Assorted domains like blocksworld, eightpuzzle, and schedworld.

奇才： blocksworld ， eightpuzzle ， schedworld 什錦域。

151.Protein Data : Undocumented

蛋白質數據：無證

152.Pseudo Periodic Synthetic Time Series : This data set is designed for testing indexing schemes in time series databases. The data appears highly periodic, but never exactly repeats itself.

偽定期的合成時間系列：該數據集是測試時間序列數據庫中的索引計劃的設計。的數據顯示高度周期性的，但永遠不會完全重演。

153.PubChem Bioassay Data : These highly imbalanced bioassay datasets are from the differing types of screening that can be performed using HTS technology. 21 datasets were created from 12 bioassays.

PubChem 數據庫生物測定數據：這些高度不平衡的生物測定數據集的篩選不同類型可以使

用高溫超導技術。 21 數據集創建了來自 12 個生物測定。

154.Quadruped Mammals : The file animals.c is a data generator of structured instances representing quadruped animals

四足哺乳動物：該文件 animals.c 是一個代表四足動物的結構實例的數據發生器

155.Qualitative Structure Activity Relationships : Two sets of datasets are given: pyrimidines and triazines

定性結構活性關系：給出兩套數據集：嘧啶和三嗪

156.Record Linkage Comparison Patterns : Element-wise comparison of records with personal data from a record linkage setting. The task is to decide from a comparison pattern whether the underlying records belong to one person.

記錄鏈接比較模式：元素比較明智的，從創紀錄的聯動設置的個人資料記錄。任務是從一個比較模式，決定是否屬于一個人的基本紀錄。

157.Relative location of CT slices on axial axis : The dataset consists of 384 features extracted from CT images. The class variable is numeric and denotes the relative location of the CT slice on the axial axis of the human body.

CT 片的軸向軸的相對位置：數據集包括從 CT 圖像中提取的 384 功能。類變量是數值表示的 CT 片對人體的軸向軸的相對位置。

158.Reuters Transcribed Subset : This dataset is created by reading out 200 files from the 10 largest Reuters classes and using an Automatic Speech Recognition system to create corresponding transcriptions.

路透社轉錄子集：創建該數據集是通過讀出最大路透社從 10 類 200 個文件，并使用自動語音識別系統，建立相應的改編。

159.Reuters-21578 Text Categorization Collection : This is a collection of documents that appeared on Reuters newswire in 1987. The documents were assembled and indexed with categories.

路透 - 21578 文本分類收集：這是出現于 1987 年，路透通訊社的文件的集合。組裝和類別索引文件。

160.Robot Execution Failures : This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized

by 15 force/torque samples collected at regular time intervals

機器人執行失敗：此數據集包含后故障檢測機器人的力和力矩測量。每次失敗的特點是在固定的時間間隔采集的樣品 15 力/力矩

161.SECOM : Data from a semi-conductor manufacturing process

世強：從半導體制造過程中的數據

162.Semeion Handwritten Digit : 1593 handwritten digits from around 80 persons were scanned, stretched in a rectangular box 16x16 in a gray scale of 256 values.

Semeion 手寫體數字： 1593 從 80 人左右的手寫數字進行掃描，伸一個矩形框，在 256 個值的灰度的 16x16。

163.Servo : Data was from a simulation of a servo system

伺服：數據從一個伺服系統的仿真

164.Shuttle Landing Control : Tiny database; all nominal values

航天飛機著陸控制：微型數據庫 ; 所有標稱值

165.Solar Flare : Each class attribute counts the number of solar flares of a certain class that occur in a 24 hour period

太陽耀斑：每個類的屬性一定的階級，在 24 小時內發生的太陽耀斑的數量進行計數

166.Soybean (Large) : Michalski’s famous soybean disease database

大豆（大）： MICHALSKI 著名的大豆疾病數據庫

167.Soybean (Small) : Michalski’s famous soybean disease database

大豆（小）： MICHALSKI 著名的大豆疾病數據庫

168.Spambase : Classifying Email as Spam or Non-Spam

Spambase：歸類為“垃圾郵件”或“非垃圾郵件的電子郵件

169.SPECT Heart : Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal.

SPECT 的心臟：心臟單個質子發射計算機斷層顯像（ SPECT）的圖像數據。每個病人分為

兩類：正常和不正常的。

170.SPECTF Heart : Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal.

SPECTF 心臟：心臟單個質子發射計算機斷層顯像（ SPECT）的圖像數據。每個病人分為兩類：正常和不正常的。

171.Spoken Arabic Digit : This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits.
Includes data from 44 male and 44 female native Arabic speakers.

口語阿拉伯語位：該數據集包含 MEL 頻率倒譜系數（ MFCCs ）講阿拉伯語數字對應的時間序列。包括 44 男 44 女的母語講阿拉伯語的數據。

172.Sponge : Data on sponges; Attributes in Spanish

海綿：海綿上的數據，在西班牙語中的屬性

173.Statlog (Australian Credit Approval) : This file concerns credit card applications. This database exists elsewhere in the repository (Credit Screening Database) in a slightly different form

Statlog（澳大利亞授信審批）：這個文件是關于信用卡申請。該數據庫存在于其他地方略有不同形式的資源庫（授信數據庫）

174.Statlog (German Credit Data) : This dataset classifies people described by a set of attributes as good or bad credit risks. Comes in two formats (one all numeric). Also comes with a cost matrix

Statlog（德國信用數據）：這個數據集劃分好壞信貸風險的屬性所描述的人。來自于兩種格式（所有數字）。還帶有一個成本矩陣

175.Statlog (Heart) : This dataset is a heart disease database similar to a database already present in the repository (Heart Disease databases) but in a slightly different form

Statlog（心）：這個數據集是一個心臟疾病數據庫，數據庫已經在庫（心臟病數據庫）類似，但略有不同的形式

176.Statlog (Image Segmentation) : This dataset is an image segmentation database similar to a database already present in the repository (Image segmentation database) but in a slightly different form.

Statlog（圖像分割）：該數據集是一個圖像分割數據庫，數據庫中已存在的資源庫（圖像分割數據庫），但在一個稍微不同的的形式類似。

177.Statlog (Landsat Satellite) : Multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood

Statlog（地球資源衛星多光譜）：在 3x3 的街區在衛星圖像的像素值，并與中央像素在每個居委會相關的分類

178.Statlog (Shuttle) : The shuttle dataset contains 9 attributes all of which are numerical. Approximately 80% of the data belongs to class 1

Statlog（班車）：穿梭集包含 20 個屬性，所有這一切都是數字。大約 80％的數據屬于 1 級

179.Statlog (Vehicle Silhouettes) : 3D objects within a 2D image by application of an ensemble of shape feature extractors to the 2D silhouettes of the objects.

Statlog（車剪影）：在一個物體的二維輪廓的形狀特征提取的合奏中的應用 2D 圖像的三維對象。

180.Statlog Project : Various Databases: Vehicle silhouttes, Landsat Sattelite, Shuttle, Australian Credit Approval, Heart Disease, Image Segmentation, German Credit

Statlog 項目：各種數據庫：車輛 silhouttes，地球資源衛星，航天飛機，澳大利亞信貸審批，心臟病，圖像分割，德國信用

181.Steel Plates Faults : A dataset of steel plates ’ faults, classified into 7 different types. The goal was to train machine learning for automatic pattern
recognition.

鋼板缺陷：一個數據集鋼板斷裂，分為 7 個不同的類型。我們的目標是培養學習機，自動模式識別。

182.Student Loan Relational : Student Loan Relational Domain

。助學貸款的關系：助學貸款的關系域

183.Synthetic Control Chart Time Series : This data consists of synthetically generated control charts.

合成控制圖的時間序列數據的綜合生成的控制圖組成。

184.Syskill and Webert Web Page Ratings : This database contains HTML source of web pages plus the ratings of a single user on these web pages.
Web pages are on four seperate subjects (Bands- recording artists; Goats;
Sheep; and BioMedical)

Syskill 和 Webert 網頁評價：該數據庫包含網頁的 HTML 源代碼再加上這些網頁上的一個單用戶的收視率。網頁是在四個不同科目（樂隊的錄音藝術家 ;山羊 ;綿羊;和生物醫學）

185.Teaching Assistant Evaluation : The data consist of evaluations of teaching performance; scores are “low”, “medium”, or “high”

助教評價：數據包括教學績效評價 ;分數“低”，“中等”，或“高”

186.Thyroid Disease : 10 separate databases from Garavan Institute

甲狀腺疾病： 10 個單獨的數據庫 Garavan 研究所

187.Tic-Tac-Toe Endgame : Binary classification task on possible configurations of tic-tac-toe game

井字腳趾殘局：可能的配置的 tic - tac - toe 游戲的二元分類任務

188.Trains : 2 data formats (structured, one-instance-per-line)

火車： 2 數據格式（結構化，每行一個實例）

189.Twenty Newsgroups : This data set consists of 20000 messages taken from 20 newsgroups.

第二十新聞組：該數據集由來自 20 個新聞組采取的 20000 消息。

190.UJI Pen Characters : Data consists of written characters in a UNIPEN-like format

宇治筆特點：數據包括在 UNIPEN 樣的格式寫入的字符

191.UJI Pen Characters (Version 2) : A pen-based database with more than 11k isolated handwritten characters

宇治鋼筆字（第 2 版）：一個孤立的手寫字符超過 11K 的鋼筆型數據庫

192.Undocumented : Various datasets without documentation (feel free to explore!)

無證：沒有證件的各種數據集（自由探索！）

193.University : Data in original (LISP-readable) form

大學：原（ Lisp 的可讀形式）中的數據

194.UNIX User Data : This file contains 9 sets of sanitized user data drawn from the command histories of 8 UNIX computer users at Purdue over the course of up to 2 years.

UNIX 用戶數據：該文件包含 9 套消毒的用戶在長達 2 年的，當然從 8 UNIX 計算機用戶的命令歷史數據繪制在普渡大學。

195.URL Reputation : Anonymized 120-day subset of the ICML-09 URL data containing 2.4 million examples and 3.2 million features.

URL 的信譽：不具名的 120 天的 ICML - 09 的 URL 數據，含有 240 萬的例子和 320 萬功能的一個子集。

196.US Census Data (1990) : The USCensus1990raw data set contains a one percent sample of the Public Use Microdata Samples (PUMS) person records drawn from the full 1990 census sample.

美國人口普查數據（ 1990 年）：USCensus1990raw 數據集包含一成市民使用微觀數據（ PUMS ）人記錄完整的 1990 年人口普查抽樣抽樣樣品。

197.Volcanoes on Venus - JARtool experiment : The JARtool project was a pioneering effort to develop an automatic system for cataloging small
volcanoes in the large set of Venus images returned by the Magellan spacecraft.

金星上的火山 - JARtool 實驗： JARtool 項目是一項開創性的努力開發一個自動化系統編目在大麥哲倫飛船返回的金星圖像設置的小火山。

198.Wall-Following Robot Navigation Data : The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around
its ‘waist’.

以下壁掛式機器人的導航數據：數據收集的 SCITOS G5 機器人的導航，通過房間下面的墻壁以順時針方向， 4 輪，使用圓周圍的“腰”，安排了 24 超聲傳感器。

199.Water Treatment Plant : Multiple classes predict plant state

水處理廠：多類預測植物狀態

200.Waveform Database Generator (Version 1) : CART book’s waveform

domains

波形數據庫生成器（版本 1）：訂購書的波形域

201.Waveform Database Generator (Version 2) : CART book’s waveform domains

波形數據庫生成（第 2 版）：訂購書的波形域

202.Wine : Using chemical analysis determine the origin of wines

葡萄酒：使用化學分析器判定葡萄酒的來源。

203.Wine Quality : Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/ ).

葡萄酒的質量：包括兩個數據集，與來自葡萄牙北部的紅與白葡萄酒樣本樣品相關。目標是通過物理化學檢驗，設計出葡萄酒的質量模型。

204.YearPredictionMSD : Prediction of the release year of a song from audio features. Songs are mostly western, commercial tracks ranging from 1922 to 2011, with a peak in the year 2000s.

年度預測 MSD ：從聲音的特征里，預測一首歌曲的發行年份、歌曲大部來自西部的、從 1922 至 2011 年的商業性的音軌，在 2000 年到達頂峰。

205.Yeast : Predicting the Cellular Localization Sites of Proteins

酵母 DataSet ：預測蛋白質的細胞定位點。

206.Zoo : Artificial, 7 classes of animals

動物園 DataSet ：人工，其中類別的動物。

總結

以上是生活随笔為你收集整理的uci数据集汇总及翻译的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

数据
UCI

上一篇：天涯明月刀登录的界面服务器显示,上述便是
下一篇：双目视觉几何框架详解（玉米专栏8篇汇总）