日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Mahout快速入门教程

發布時間:2024/1/23 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Mahout快速入门教程 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.


?????? Mahout 是一個很強大的數據挖掘工具,是一個分布式機器學習算法的集合,包括:被稱為Taste的分布式協同過濾的實現、分類、聚類等。Mahout最大的優點就是基于hadoop實現,把很多以前運行于單機上的算法,轉化為了MapReduce模式,這樣大大提升了算法可處理的數據量和處理性能。

一、Mahout安裝、配置

1、下載并解壓Mahout
http://archive.apache.org/dist/mahout/
tar -zxvf mahout-distribution-0.9.tar.gz

2、配置環境變量
# set mahout environment
export MAHOUT_HOME=/mnt/jediael/mahout/mahout-distribution-0.9
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH

3、安裝mahout
[jediael@master mahout-distribution-0.9]$ pwd
/mnt/jediael/mahout/mahout-distribution-0.9
[jediael@master mahout-distribution-0.9]$ mvn install

4、驗證Mahout是否安裝成功
??? 執行命令mahout。若列出一些算法,則成功:

[jediael@master mahout-distribution-0.9]$ mahout Running on hadoop, using /mnt/jediael/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /mnt/jediael/mahout/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar An example program must be given as the first argument. Valid program names are:arff.vector: : Generate Vectors from an ARFF file or directorybaumwelch: : Baum-Welch algorithm for unsupervised HMM trainingcanopy: : Canopy clusteringcat: : Print a file or resource as the logistic regression models would see itcleansvd: : Cleanup and verification of SVD outputclusterdump: : Dump cluster output to textclusterpp: : Groups Clustering Output In Clusterscmdump: : Dump confusion matrix in HTML or text formatsconcatmatrices: : Concatenates 2 matrices of same cardinality into a single matrixcvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probesfkmeans: : Fuzzy K-means clusteringhmmpredict: : Generate random sequence of observations by given HMMitemsimilarity: : Compute the item-item-similarities for item-based collaborative filteringkmeans: : K-means clusteringlucene.vector: : Generate Vectors from a Lucene indexlucene2seq: : Generate Text SequenceFiles from a Lucene indexmatrixdump: : Dump matrix in CSV formatmatrixmult: : Take the product of two matricesparallelALS: : ALS-WR factorization of a rating matrixqualcluster: : Runs clustering experiments and summarizes results in a CSVrecommendfactorized: : Compute recommendations using the factorization of a rating matrixrecommenditembased: : Compute recommendations using item-based collaborative filteringregexconverter: : Convert text files on a per line basis based on regular expressionsresplit: : Splits a set of SequenceFiles into a number of equal splitsrowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}rowsimilarity: : Compute the pairwise similarities of the rows of a matrixrunAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression modelrunlogistic: : Run a logistic regression model against CSV dataseq2encoded: : Encoded Sparse Vector generation from Text sequence filesseq2sparse: : Sparse Vector generation from Text sequence filesseqdirectory: : Generate sequence files (of Text) from a directoryseqdumper: : Generic Sequence File dumperseqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archivesseqwiki: : Wikipedia xml dump to sequence filespectralkmeans: : Spectral k-means clusteringsplit: : Split Input data into test and train setssplitDataset: : split a rating dataset into training and probe partsssvd: : Stochastic SVDstreamingkmeans: : Streaming k-means clusteringsvd: : Lanczos Singular Value Decompositiontestnb: : Test the Vector-based Bayes classifiertrainAdaptiveLogistic: : Train an AdaptivelogisticRegression modeltrainlogistic: : Train a logistic regression using stochastic gradient descenttrainnb: : Train the Vector-based Bayes classifiertranspose: : Take the transpose of a matrixvalidateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data setvecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectorsvectordump: : Dump vectors from a sequence file to textviterbi: : Viterbi decoding of hidden states from given output states sequence


二、使用簡單示例驗證mahout
1、啟動Hadoop
2、下載測試數據
?????????? http://archive.ics.uci.edu/ml/databases/synthetic_control/鏈接中的synthetic_control.data
或者百度一下也很容易找到這個示例數據。
3、上傳測試數據
hadoop fs -put synthetic_control.data testdata
4、 使用Mahout中的kmeans聚類算法,執行命令:
mahout -core? org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
花費9分鐘左右完成聚類 。
5、查看聚類結果
??? 執行hadoop fs -ls /user/root/output,查看聚類結果。
[jediael@master mahout-distribution-0.9]$ hadoop fs -ls output Found 15 items -rw-r--r-- 2 jediael supergroup 194 2015-03-07 15:07 /user/jediael/output/_policy drwxr-xr-x - jediael supergroup 0 2015-03-07 15:07 /user/jediael/output/clusteredPoints drwxr-xr-x - jediael supergroup 0 2015-03-07 15:02 /user/jediael/output/clusters-0 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:02 /user/jediael/output/clusters-1 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:07 /user/jediael/output/clusters-10-final drwxr-xr-x - jediael supergroup 0 2015-03-07 15:03 /user/jediael/output/clusters-2 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:03 /user/jediael/output/clusters-3 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:04 /user/jediael/output/clusters-4 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:04 /user/jediael/output/clusters-5 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:05 /user/jediael/output/clusters-6 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:05 /user/jediael/output/clusters-7 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:06 /user/jediael/output/clusters-8 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:07 /user/jediael/output/clusters-9 drwxr-xr-x - jediael supergroup 0 2015-03-07 15:02 /user/jediael/output/data drwxr-xr-x - jediael supergroup 0 2015-03-07 15:02 /user/jediael/output/random-seeds



創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的Mahout快速入门教程的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。