如何使用Apache的Prediction IO Machine Learning Server构建推荐引擎
by Vaghawan Ojha
通過瓦哈萬·歐哈(Vaghawan Ojha)
如何使用Apache的Prediction IO Machine Learning Server構(gòu)建推薦引擎 (How to build a recommendation engine using Apache’s Prediction IO Machine Learning Server)
This post will guide you through installing Apache Prediction IO machine learning server. We’ll use one of its templates called Recommendation to build a working recommendation engine. The finished product will be able to recommend customized products depending upon a given user’s purchasing behavior.
這篇文章將指導(dǎo)您安裝Apache Prediction IO機器學(xué)習(xí)服務(wù)器。 我們將使用其名為“推薦”的模板之一來構(gòu)建一個有效的推薦引擎。 根據(jù)給定用戶的購買行為,最終產(chǎn)品將能夠推薦定制產(chǎn)品。
問題 (The Problem)
You’ve got bunch of data and you need to predict something accurately so you can help your business grow its sales, grow customers, grow profits, grow conversion, or whatever the business need is.
您擁有大量數(shù)據(jù),并且需要準確地進行預(yù)測,以便可以幫助您的企業(yè)提高銷售量,增加客戶,增加利潤,提高轉(zhuǎn)化率或滿足業(yè)務(wù)需求。
Recommendation systems are probably the first step everyone takes toward applying data science and machine learning. Recommendation engines use data as an input and run their algorithms over them. Then they output models from which we can make prediction about what a user is really going to buy, or what a user may like or dislike.
推薦系統(tǒng)可能是每個人走向應(yīng)用數(shù)據(jù)科學(xué)和機器學(xué)習(xí)的第一步。 推薦引擎將數(shù)據(jù)用作輸入,并在其上運行其算法。 然后,他們輸出模型,通過這些模型,我們可以預(yù)測用戶的實際購買意愿,或者用戶可能喜歡或不喜歡的東西。
輸入預(yù)測IO (Enter Prediction IO)
“Apache PredictionIO (incubating) is an open source Machine Learning Server built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task.” — Apache Prediction IO documentation
“ Apache PredictionIO(孵化中)是一個開源的機器學(xué)習(xí)服務(wù)器,它建立在最新的開源堆棧之上,供開發(fā)人員和數(shù)據(jù)科學(xué)家為任何機器學(xué)習(xí)任務(wù)創(chuàng)建預(yù)測引擎?!?— Apache Prediction IO文檔
The very first look at the documentation makes me feel good because it’s giving me access to a powerful tech stack for solving machine learning problems. What’s more interesting is that Prediction IO gives access to many templates, which are helpful for solving the real problems.
初看文檔會使我感覺很好,因為它使我能夠使用功能強大的技術(shù)堆棧來解決機器學(xué)習(xí)問題。 更有趣的是,Prediction IO可以訪問許多模板,這有助于解決實際問題。
The template gallery consists many templates for recommendation, classification, regression, natural language processing, and many more. It make use of technology like Apache Hadoop, Apache spark, ElasticSearch and Apache Hbase to make the machine learning server scaleable and efficient. I’m not going to talk much about the Prediction IO itself, because you can do that on your own here.
模板庫包含許多用于推薦,分類,回歸,自然語言處理等的模板。 它利用Apache Hadoop,Apache Spark,ElasticSearch和Apache Hbase等技術(shù)使機器學(xué)習(xí)服務(wù)器可擴展且高效。 我不會談?wù)揚rediction IO本身,因為您可以在這里自行完成。
So back to the problem: I have a bunch of data from user purchase histories, which consists user_id, product_id and purchased_date. Using these, I need to make a customized prediction/recommendation to the user. Considering this problem, we’ll use a Recommendation Template with Prediction IO Machine Learning server. We’ll make use of Prediction IO event server as well as bulk data import.
回到問題所在:我從用戶購買歷史中獲得了一堆數(shù)據(jù),其中包括user_id,product_id和Purchased_date。 使用這些,我需要對用戶進行定制的預(yù)測/推薦。 考慮到此問題,我們將使用帶有預(yù)測IO機器學(xué)習(xí)服務(wù)器的推薦模板。 我們將使用Prediction IO事件服務(wù)器以及批量數(shù)據(jù)導(dǎo)入。
So let’s get ahead. (Note: This guide assume that you’re using Ubuntu system for the installation)
因此,讓我們前進。 (注意:本指南假定您使用Ubuntu系統(tǒng)進行安裝)
步驟1:下載Apache Prediction IO (Step 1: Download Apache Prediction IO)
Go to the home directory of your current user and Download The latest 0.10.0 Prediction IO apache incubator. I assume you’re in the following dir (/home/you/)
轉(zhuǎn)到當(dāng)前用戶的主目錄,然后下載最新的0.10.0 Prediction IO apache培養(yǎng)箱。 我假設(shè)您位于以下目錄(/home/you/)
git clone git@github.com:apache/incubator-predictionio.gitNow go to the directory `incubator-predictionio` where we have cloned the Prediction IO repo. If you have cloned it in a different directory, make sure to be inside that dir in your terminal.
現(xiàn)在轉(zhuǎn)到目錄“ incubator-predictionio” ,我們在其中克隆了Prediction IO存儲庫。 如果已將其克隆到其他目錄中,請確保將其放在終端的該目錄中。
Now let’s checkout the current stable version of Prediction IO which is 0.10.0
現(xiàn)在,讓我們簽出Prediction IO的當(dāng)前穩(wěn)定版本0.10.0
cd incubator-predictionio # or any dir where you have cloned pio.git checkout release/0.10.0步驟2:讓我們分配預(yù)測IO (Step 2: Let’s Make A Distribution Of Prediction IO)
./make-distribution.shIf everything went Ok, then you will get the message like this in your console:
如果一切正常,那么您將在控制臺中收到以下消息:
However if you encountered something like this:
但是,如果遇到以下情況:
then you would have to remove .ivy2 dir in your home directory, by default this folder is hidden. You need to remove it completely and then run the ./make-distribution.sh again for the build to successfully generate a distribution file.
那么你就必須刪除.ivy2你的home目錄目錄 ,默認情況下該文件夾是隱藏的。 您需要將其完全刪除,然后再次運行./make-distribution.sh ,以使構(gòu)建成功生成分發(fā)文件。
Personally I’ve faced this issue many times, but I’m not sure this is the valid way to get through this problem. But removing the .ivy2 folder and again running the make-distribution command works.
我個人已經(jīng)多次遇到此問題,但是我不確定這是否是解決此問題的有效方法。 但是刪除.ivy2文件夾,然后再次運行make-distribution命令即可。
步驟3:提取分發(fā)文件 (Step 3: Extract The Distribution File)
After the successful build, we will have a filename called PredictionIO-0.10.0-incubating.tar.gz inside the directory where we built our Prediction IO. Now let’s extract it into a directory called pio.
成功構(gòu)建之后,我們將在構(gòu)建Prediction IO的目錄中擁有一個名為PredictionIO-0.10.0-incubating.tar.gz的文件名。 現(xiàn)在,將其提取到名為pio的目錄中。
mkdir ~/piotar zxvf PredictionIO-0.10.0-incubating.tar.gz -C ~/pioMake sure the tar.gz filename match the distribution file that you have inside the original predictionIo directory. If you forgot to check out the 0.10.0 version of Prediction IO, you’re sure to get a different file name, because by default the version would be the latest one.
確保tar.gz文件名與原始預(yù)測目錄中的分發(fā)文件匹配。 如果您忘記簽出Prediction IO的0.10.0版本,那么您肯定會獲得不同的文件名,因為默認情況下該版本是最新的。
步驟4:準備下載依賴項 (Step 4: Prepare For Downloading Dependencies)
cd ~/pio#Let’s make a vendors folder inside ~/pio/PredictionIO-0.10.0-incubating where we will save hadoop, elasticsearch and hbase.mkdir ~/pio/PredictionIO-0.10.0-incubating/vendors步驟5:下載并設(shè)置Spark (Step 5: Download and Setup Spark)
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgzIf your current directory is ~/pio the command will download the spark inside pio dir. Now let’s extract it. Depending upon where you downloaded it, you might want to change the below command.
如果您當(dāng)前的目錄是~/pio該命令將在pio dir中下載spark。 現(xiàn)在讓我們提取它。 根據(jù)下載位置,可能需要更改以下命令。
tar zxvfC spark-1.5.1-bin-hadoop2.6.tgz PredictionIO-0.10.0-incubating/vendors# This will extract the spark setup that we downloaded and put it inside the vendors folder of our fresh pio installation.Make sure you had done mkdir PredictionIO-0.10.0-incubating/vendors earlier.
確保您之前已完成mkdir PredictionIO-0.10.0-incubating/vendors 。
步驟6:下載并設(shè)置ElasticSearch (Step 6: Download & Setup ElasticSearch)
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.tar.gz#Let’s extract elastic search inside vendors folder.tar zxvfC elasticsearch-1.4.4.tar.gz PredictionIO-0.10.0-incubating/vendors步驟7:下載并設(shè)置Hbase (Step 7: Download and Setup Hbase)
wget http://archive.apache.org/dist/hbase/hbase-1.0.0/hbase-1.0.0-bin.tar.gz#Let’s extract it.tar zxvfC hbase-1.0.0-bin.tar.gz PredictionIO-0.10.0-incubating/vendorsNow let’s edit the hbase-site.xml to point the hbase configuration to the right dir. Considering you’re inside ~/pio dir, you could hit this command and edit the hbase conf.
現(xiàn)在,讓我們編輯hbase-site.xml ,將hbase配置指向正確的目錄。 考慮到您位于~/pio目錄中,可以單擊此命令并編輯hbase conf。
nano PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/conf/hbase-site.xmlReplace the configuration block with the following configuration.
用以下配置替換配置塊。
<configuration> <property> <name>hbase.rootdir</name> <value>file:///home/you/pio/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/data</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/you/pio/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/zookeeper</value> </property></configuration>Here “you” signifies to your user dir, for example if you’re doing all this as a user “tom” then it would be something like file::///home/tom/…
這里的“ 您”表示您的用戶目錄,例如,如果您以用戶“ tom”的身份進行所有操作,則該文件將類似于file :: /// home / tom /…。
Make sure the right files are there.
確保正確的文件在那里。
Now let’s set up JAVA_HOME in hbase-env.sh .
現(xiàn)在讓我們在hbase-env.sh中設(shè)置JAVA_HOME。
nano PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/conf/hbase-env.shIf you’re unsure about which version of JDK you’re currently using, follow these step and make necessary changes if required.
如果不確定當(dāng)前使用的是哪個版本的JDK,請按照以下步驟操作,并根據(jù)需要進行必要的更改。
We need Java SE Development Kit 7 or greater for Prediction IO to work. Now let’s make sure we’re using the right version by running:
我們需要Java SE Development Kit 7或更高版本才能運行Prediction IO。 現(xiàn)在,通過運行以下命令確保使用的版本正確:
sudo update-alternatives — config javaBy default I’m using:
默認情況下,我使用:
java -versionopenjdk version “1.8.0_121”O(jiān)penJDK Runtime Environment (build 1.8.0_121–8u121-b13–0ubuntu1.16.04.2-b13)OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)If you’re using below 1.7, then you should change the java config to use a version of java that is equal to 1.7 or greater. You can change that with the update-alternatives command as given above. In my case the command sudo update-alternatives -config java outputs something like this:
如果您使用的是1.7以下版本,則應(yīng)更改java配置,以使用等于或大于1.7的java版本。 您可以使用上述給定的update-alternatives命令更改它。 在我的情況下,命令sudo update-alternatives -config java輸出如下內(nèi)容:
If you have any trouble setting this up, you can follow this link.
如果您在設(shè)置時遇到任何麻煩,可以點擊此鏈接 。
Now let’s export the JAVA_HOME path in the .bashrc file inside /home/you/pio.
現(xiàn)在,讓我們在/home/you/pio.內(nèi)的.bashrc文件中導(dǎo)出JAVA_HOME路徑/home/you/pio.
Considering you’re on ~/pio dir, you could do this: nano .bashrc
考慮到您在~/pio目錄下,可以執(zhí)行以下操作: nano .bashrc
Don’t forget to do source .bashrc after you set up the java home in the .bashrc.
在source .bashrc設(shè)置Java主頁之后,不要忘記執(zhí)行source .bashrc .bashrc 。
步驟8:配置預(yù)測IO環(huán)境 (Step 8: Configure the Prediction IO Environment)
Now let’s configure pio.env.sh to give a final touch to our Prediction IO Machine learning server installation.
現(xiàn)在,讓我們配置pio.env.sh,以最終了解我們的Prediction IO Machine學(xué)習(xí)服務(wù)器安裝。
nano PredictionIO-0.10.0-incubating/conf/pio-env.shWe’re not using ProsgesSQl or MySql for our event server, So let’s comment out that section and have a pio-env.sh something like this:
我們沒有為事件服務(wù)器使用ProsgesSQl或MySql,所以讓我們注釋掉該部分,并創(chuàng)建一個pio-env.sh像這樣:
#!/usr/bin/env bash## Copy this file as pio-env.sh and edit it for your site's configuration.## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License. You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.## PredictionIO Main Configuration## This section controls core behavior of PredictionIO. It is very likely that# you need to change these to fit your site.# SPARK_HOME: Apache Spark is a hard dependency and must be configured.SPARK_HOME=$PIO_HOME/vendors/spark-1.5.1-bin-hadoop2.6POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-9.4-1204.jdbc41.jarMYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.37.jar# ES_CONF_DIR: You must configure this if you have advanced configuration for# your Elasticsearch setup. ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.4.4/conf# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO# with Hadoop 2. HADOOP_CONF_DIR=$PIO_HOME/vendors/spark-1.5.1-bin-hadoop2.6/conf# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO# with HBase on a remote cluster. HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf# Filesystem paths where PredictionIO uses as block storage.PIO_FS_BASEDIR=$HOME/.pio_storePIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/enginesPIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp# PredictionIO Storage Configuration## This section controls programs that make use of PredictionIO's built-in# storage facilities. Default values are shown below.## For more information on storage configuration please refer to# http://predictionio.incubator.apache.org/system/anotherdatastore/# Storage Repositories# Default is to use PostgreSQLPIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_metaPIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCHPIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_eventPIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASEPIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_modelPIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS# Storage Data Sources# PostgreSQL Default Settings# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly# PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc# PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio# PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio# PIO_STORAGE_SOURCES_PGSQL_PASSWORD=root# MySQL Example# PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc# PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio# PIO_STORAGE_SOURCES_MYSQL_USERNAME=root# PIO_STORAGE_SOURCES_MYSQL_PASSWORD=root# Elasticsearch Example PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=firstcluster PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300 PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.4.4# ocal File System ExamplePIO_STORAGE_SOURCES_LOCALFS_TYPE=localfsPIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models# HBase ExamplePIO_STORAGE_SOURCES_HBASE_TYPE=hbasePIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.0.0步驟9:在ElasticSearch配置中配置集群名稱 (Step 9: Configure cluster name in ElasticSearch config)
Since this line PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=firstcluster points to our cluster name in the ElasticSearch configuration, let’s replace a default cluster name in ElasticSearch configuration.
由于此行PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=firstcluster指向我們在ElasticSearch配置中的集群名稱,因此讓我們替換ElasticSearch配置中的默認集群名稱。
nano PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.4.4/config/elasticsearch.yml步驟10:導(dǎo)出預(yù)測IO路徑 (Step 10: Export The Prediction IO Path)
Let’s now export the Prediction IO path so we could freely use the pio command without pointing to it’s bin every time. Run the following command in your terminal:
現(xiàn)在讓我們導(dǎo)出Prediction IO路徑,以便我們可以自由使用pio命令,而不必每次都指向它的bin。 在終端中運行以下命令:
PATH=$PATH:/home/you/pio/PredictionIO-0.10.0-incubating/bin; export PATH
PATH=$PATH:/home/you/pio/PredictionIO-0.10.0-incubating/bin; export PATH
步驟#11:授予預(yù)測IO 安裝 權(quán)限 (Step #11: Give Permission To Prediction IO Installation)
sudo chmod -R 775 ~/pioThis is vital because if we didn’t give permission to the pio folder, the Prediction IO process won’t be able to write log files.
這很重要,因為如果我們不授予pio文件夾許可,則Prediction IO進程將無法寫入日志文件。
步驟#12:啟動預(yù)測IO服務(wù)器 (Step #12: Start Prediction IO Server)
Now we’re ready to go, let’s start our Prediction IO server. Before running this command make sure you exported the pio path described above.
現(xiàn)在我們可以開始了,讓我們啟動Prediction IO服務(wù)器。 在運行此命令之前,請確保已導(dǎo)出上述pio路徑。
pio-start-all#if you forgot to export the pio path, it won't work and you manually have to point the pio bin path.If everything is Ok to this point, you would see the output something like this.
如果到目前為止一切正常,您將看到類似以下的輸出。
Note: If you forget to give permission then, there will be issues writing logs and if your JAVA_HOME path is incorrect HBASE wouldn’t start properly and it would give you the error.注意:如果您忘記授予權(quán)限,那么在編寫日志時會出現(xiàn)問題,并且如果您的JAVA_HOME路徑不正確,則HBASE無法正確啟動,并且會給您錯誤。步驟#13:驗證過程 (Step #13: Verify The Process)
Now let’s verify our installation with pio status, if everything is Ok, you will get an output like this:
現(xiàn)在讓我們以pio status驗證安裝,如果一切正常,您將獲得如下輸出:
If you encounter error in Hbase or any other backend storage, make sure everything was started properly.
如果您在Hbase或任何其他后端存儲中遇到錯誤,請確保一切均已正確啟動。
Our Prediction IO Server is ready to implement the template now.
我們的Prediction IO Server現(xiàn)在準備實施模板。
實施推薦引擎 (Implementing the Recommendation Engine)
A recommendation engine template is a Prediction IO engine template that uses collaborative filtering to make personalized recommendation to the user. It uses can be in E-commerce site, news site, or any application that collects user histories of event to give a personalized experiences to the user.
推薦引擎模板是Prediction IO引擎模板,它使用協(xié)作過濾向用戶做出個性化推薦。 它可以在電子商務(wù)站點,新聞?wù)军c或任何收集事件的用戶歷史記錄以向用戶提供個性化體驗的應(yīng)用程序中使用。
We’ll implement this template in Prediction IO with few eCommerce user data, just to do an sample experiment with Prediction IO machine learning server.
我們將使用很少的電子商務(wù)用戶數(shù)據(jù)在Prediction IO中實現(xiàn)此模板,僅用于Prediction IO機器學(xué)習(xí)服務(wù)器的示例實驗。
Now let’s back to our home dir cd ~
現(xiàn)在讓我們回到主目錄cd ~
步驟14: 下載推薦模板 (Step 14: Download the Recommendation Template)
pio template get apache/incubator-predictionio-template-recommender MyRecommendationIt will ask for company name and author name, input subsequently, now we have a MyRecommendation Template inside our home dir. Just a reminder: you can put the template anywhere you want.
它將詢問公司名稱和作者名稱,然后輸入,現(xiàn)在我們的主目錄中有一個MyRecommendation模板。 提醒一下:您可以將模板放置在所需的任何位置。
15. 創(chuàng)建我們的第一個預(yù)測IO應(yīng)用程序 (15. Create Our First Prediction IO App)
Now let’s go inside the MyRecommendation dir cd MyRecommendation
現(xiàn)在讓我們進入MyRecommendation目錄cd MyRecommendation
After you’re inside the template dir, let’s create our first Prediction IO app called ourrecommendation.
進入模板目錄后,讓我們創(chuàng)建第一個Prediction IO應(yīng)用程序,稱為ourrecommendation 。
You will get output like this. Please remember that you can give any name to your app, but for this example I’ll be using the app name ourrecommendation.
您將得到這樣的輸出。 請記住,您可以給您的應(yīng)用程序起任何名字,但是在本例中,我將使用應(yīng)用程序名稱ourrecommendation 。
pio app new ourrecommendationThis command will output something like this:
該命令將輸出如下內(nèi)容:
Let’s verify that our new app is there with this command:
讓我們使用以下命令驗證我們的新應(yīng)用是否存在:
pio app listNow our app should be listed in the list.
現(xiàn)在,我們的應(yīng)用程序應(yīng)在列表中列出。
步驟16:導(dǎo)入一些樣本數(shù)據(jù) (Step 16: Import Some Sample Data)
Let’s download the sample-data from gist, and put that inside importdata folder inside MyRecommendation folder.
讓我們從gist下載示例數(shù)據(jù) ,然后將其放入MyRecommendation文件夾中的importdata文件夾中。
mkdir importdataCopy the sample-data.json file that you just created inside the importdata folder.
復(fù)制您剛剛在importdata文件夾中創(chuàng)建的sample-data.json文件。
Finally let’s import the data inside our ourrecommendation app. Considering you’re inside the MyRecommendation dir you can do this to batch import the events.
最后,讓我們將數(shù)據(jù)導(dǎo)入我們的推薦應(yīng)用程序中。 考慮到您位于MyRecommendation dir ,可以執(zhí)行此操作以批量導(dǎo)入事件。
pio import — appid 1 — input importdata/data-sample.json(Note: make sure the appid of ourrecommendation is same as of your appid that you just provided)
(注意:請確保我們推薦的appid與您剛提供的appid相同)
步驟17:建立應(yīng)用程式 (Step 17: Build The App)
Before building the app, let’s edit engine.json file inside the MyRecommendation directory to replicate our app name inside it. It should look something like this:
在構(gòu)建應(yīng)用程序之前,讓我們在MyRecommendation目錄中編輯engine.json文件,以在其中復(fù)制我們的應(yīng)用程序名稱。 它看起來應(yīng)該像這樣:
Note: Don’t copy this, just change the “appName” in your engine.json.
注意:請勿復(fù)制此文件,只需在engine.json中更改“ appName”即可。
{ "id": "default", "description": "Default settings", "engineFactory": "orgname.RecommendationEngine", "datasource": { "params" : { "appName": "ourrecommendation" } }, "algorithms": [ { "name": "als", "params": { "rank": 10, "numIterations": 5, "lambda": 0.01, "seed": 3 } } ]}Note: the “engineFactory” will be automatically generated when you pull the template in our step 14, so you don’t have to change that. In my case, it’s my orgname, which I put in the terminal prompt during installation of the template. In you engine.json you just need to modify the appName, please don’t change anything else in there.
注意:在我們的第14步中提取模板時,“ engineFactory”將自動??生成,因此您無需更改它。 就我而言,這是我的組織名稱,在模板安裝過程中將其放在終端提示中。 在engine.json中,您只需要修改appName,請不要在其中進行任何更改。
In the same dir where our MyRecommendation engine template lies, let’s run this pio command to build our app.
在MyRecommendation引擎模板所在的目錄中,讓我們運行此pio命令來構(gòu)建我們的應(yīng)用程序。
pio build(Note: if you wanna see all the messages during the building process, you can run this pio build — verbose)
(注意:如果您想在構(gòu)建過程中看到所有消息,則可以運行此pio build — verbose )
It can take sometimes to build our app, since this is the first time. From next time it takes less time. You should get an output like this:
由于這是第一次,有時可能需要構(gòu)建我們的應(yīng)用程序。 從下一次開始,將花費更少的時間。 您應(yīng)該得到如下輸出:
Our engine is now ready to train our data.
現(xiàn)在,我們的引擎已準備好訓(xùn)練我們的數(shù)據(jù)。
步驟18: 訓(xùn)練數(shù)據(jù)集 (Step 18: Train The dataset)
pio trainIf you get an error like the one below in the middle of the training, then you may have to change number of iterations inside your engine.json and rebuild the app.
如果在培訓(xùn)過程中遇到類似以下的錯誤,則可能必須更改engine.json中的迭代次數(shù)并重新構(gòu)建應(yīng)用程序。
Let’s change the numItirations in engine.json which is by default 20 to 5:
讓我們將engine.json中的numItirations更改為默認值20到5:
“numIterations”: 5,Now let’s build the app with pio build, again do pio train. The training should be completed successfully. After finishing the training you will get the message like this:
現(xiàn)在,讓我們使用pio build構(gòu)建應(yīng)用程序,再次執(zhí)行pio train 。 培訓(xùn)應(yīng)成功完成。 完成培訓(xùn)后,您將收到以下消息:
Please note that this training works just for small data, if you however want to try with large data set then we would have to set up an standalone spark worker to accomplish the training. (I will write about this in a future post.)
請注意,此培訓(xùn)僅適用于小數(shù)據(jù),但是,如果您要嘗試使用大數(shù)據(jù)集,則我們將必須設(shè)置一個獨立的Spark工作者來完成培訓(xùn)。 (我將在以后的文章中對此進行介紹。)
步驟19: 部署并提供預(yù)測 (Step 19: Deploy and Serve the prediction)
pio deploy#by default it will take 8000 port.We will now have our prediction io server running.
現(xiàn)在,我們將運行預(yù)測io服務(wù)器。
Note: to keep it simple, I’m not discussing about event server in this post, since it may get even longer, thus we’re focusing on simple use case of Prediction IO.
注意:為簡單起見,本文中不再討論事件服務(wù)器,因為它可能會更長,因此我們將重點放在Prediction IO的簡單用例上。
Now let’s get the prediction using curl.
現(xiàn)在,讓我們使用curl進行預(yù)測。
Open up a new terminal and hit:
打開一個新終端,然后單擊:
curl -H “Content-Type: application/json” \-d ‘{ “user”: “user1”, “num”: 4 }’ http://localhost:8000/queries.jsonIn the above query, the user signifies to the user_id in our event data, and the num means, how many recommendation we want to get.
在上面的查詢中,用戶在事件數(shù)據(jù)中表示user_id,而num表示我們要獲得多少推薦。
Now you will get the result like this:
現(xiàn)在您將獲得如下結(jié)果:
{"itemScores":[{"item":"product5","score":3.9993937903501093},{"item":"product101","score":3.9989989282500904},{"item":"product30","score":3.994934059438341},{"item":"product98","score":3.1035806376677866}]}That’s it! Great Job. We’re done. But wait, what’s next?
而已! 很好。 大功告成 但是,等等, 下一步是什么?
Next we will use spark standalone cluster to train large dataset (believe me, its easy, if you wanna do it right now, you could follow the documenation in Prediction IO)
接下來,我們將使用獨立的Spark集群來訓(xùn)練大型數(shù)據(jù)集(相信我,這很簡單,如果您想立即進行操作,可以遵循Prediction IO中的文檔說明 )
We will use Universal Recommender from Action ML to build a recommendation engine.
我們將使用Action ML的Universal Recommender構(gòu)建推薦引擎。
Important Notes:
重要筆記:
The template we used uses ALS algorithm with explicit feedback, however you can easily switch to implicit depending upon your need.
我們使用的模板使用具有顯式反饋的ALS算法 ,但是您可以根據(jù)需要輕松切換為隱式。
If you’re curious about Prediction IO and want to learn more you can do that on the Prediction IO official site.
如果您對Prediction IO感到好奇并想了解更多信息,可以在Prediction IO官方網(wǎng)站上進行 。
- If your Java version is not suitable for Prediction IO specification, then you are sure to run into problems. So make sure you configure this first. 如果您的Java版本不適合Prediction IO規(guī)范,那么您肯定會遇到問題。 因此,請確保您首先配置它。
Don’t run any of the commands described above with sudo except to give permission. Otherwise you will run into problems.
除非獲得許可,否則不要使用sudo運行上述任何命令。 否則,您將遇到問題。
- Make sure your java path is correct, and make sure to export the Prediction IO path. You might want to add the Prediction IO path to your .bashrc or profile as well depending upon your need. 確保您的Java路徑正確,并確保導(dǎo)出Prediction IO路徑。 您可能還需要根據(jù)需要將Prediction IO路徑添加到.bashrc或配置文件中。
2017/07/14更新:使用Spark訓(xùn)練真實數(shù)據(jù)集 (Update 2017/07/14: Using Spark To Train Real Data Sets)
We have the spark installed inside our vendors folders, with our current installation, our spark bin in the following dir.
我們已經(jīng)將spark安裝在我們的vendor文件夾中,并且當(dāng)前安裝是在以下目錄中的spark bin。
~/pio/PredictionIO-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6/sbinFrom there we have to setup a spark primary and replica to execute our model training to accomplish it faster. If your training seems to stuck we can use the spark options to accomplish the training tasks.
從那里,我們必須設(shè)置一個spark主對象和一個副本來執(zhí)行我們的模型訓(xùn)練,以更快地完成它。 如果您的培訓(xùn)似乎停滯不前,我們可以使用spark選項來完成培訓(xùn)任務(wù)。
啟動Spark主數(shù)據(jù)庫 (Start the Spark primary)
~/pio/PredictionIO-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6/sbin/start-master.shThis will start the spark primary. Now let’s browse the spark primary’s web UI by going into http://localhost:8080/ in the browser.
這將啟動主火花。 現(xiàn)在,通過在瀏覽器中進入http:// localhost:8080 /來瀏覽spark primary的Web UI。
Now let’s copy the primary-url to start the replica worker. In our case the primary spark URL is something like this:
現(xiàn)在,讓我們復(fù)制主URL以啟動副本工作器。 在我們的例子中,主要的Spark URL是這樣的:
spark://your-machine:7077 (your machine signifies to your machine name)
spark://您的機器:7077(您的機器表示您的機器名稱)
~/pio/PredictionIO-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6/sbin/start-slave.sh spark://your-machine:7077The worker will start. Refresh the web ui you will see the registered worker this time. Now let’s run the training again.
工人將開始。 刷新Web用戶界面,您這次將看到注冊的工作者。 現(xiàn)在,讓我們再次運行培訓(xùn)。
pio train -- --master spark://localhost:7077 --driver-memory 4G --executor-memory 6GGreat!
大!
Special Thanks: Pat Ferrel From Action ML & Marius Rabenarivo
特別鳴謝: Action ML和Marius Rabenarivo的Pat Ferrel
翻譯自: https://www.freecodecamp.org/news/building-an-recommendation-engine-with-apache-prediction-io-ml-server-aed0319e0d8/
總結(jié)
以上是生活随笔為你收集整理的如何使用Apache的Prediction IO Machine Learning Server构建推荐引擎的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到很大的蛇追我是什么意思
- 下一篇: github开源大项目_GitHub刚刚