當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Machine Learning On Spark——基础数据结构（二)

發布時間：2024/1/23 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 Machine Learning On Spark——基础数据结构（二) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本節主要內容

IndexedRowMatrix

BlockMatrix

1. IndexedRowMatrix的使用

IndexedRowMatrix，顧名思義就是帶索引的RowMatrix，它采用case class IndexedRow(index: Long, vector: Vector)類來表示矩陣的一行，index表示的就是它的索引，vector表示其要存儲的內容。其使用方式如下：

package cn.ml.datastructimport org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.linalg.distributed.RowMatrix import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix import org.apache.spark.mllib.stat.MultivariateStatisticalSummary import org.apache.spark.mllib.linalg.Matrix import org.apache.spark.mllib.linalg.SingularValueDecomposition import org.apache.spark.mllib.linalg.Matrices import org.apache.spark.mllib.linalg.distributed.IndexedRow import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrixobject IndexRowMatrixDemo extends App {val sparkConf = new SparkConf().setAppName("IndexRowMatrixDemo ").setMaster("spark://sparkmaster:7077") val sc = new SparkContext(sparkConf)//定義一個隱式轉換函數implicit def double2long(x:Double)=x.toLong//數據中的第一個元素為IndexedRow中的index，剩余的映射到vector//f.take(1)(0)獲取到第一個元素并自動進行隱式轉換，轉換成Long類型val rdd1= sc.parallelize(Array(Array(1.0,2.0,3.0,4.0),Array(2.0,3.0,4.0,5.0),Array(3.0,4.0,5.0,6.0))).map(f => IndexedRow(f.take(1)(0),Vectors.dense(f.drop(1))))val indexRowMatrix = new IndexedRowMatrix(rdd1)//計算拉姆矩陣var gramianMatrix:Matrix=indexRowMatrix.computeGramianMatrix()//轉換成行矩陣RowMatrixvar rowMatrix:RowMatrix=indexRowMatrix.toRowMatrix()//其它方法例如computeSVD計算奇異值、multiply矩陣相乘等操作，方法使用與RowMaxtrix相同}

2. BlockMatrix的使用

分塊矩陣將一個矩陣分成若干塊，例如：

可以將其分成四塊

從而矩陣P有如下形式

更多分塊矩陣的相關內容包括分塊矩陣的轉置、分塊矩陣的相乘操作可以參見https://en.wikipedia.org/wiki/Block_matrix

package cn.ml.datastructimport org.apache.spark.mllib.linalg.distributed.BlockMatrix import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix import org.apache.spark.mllib.linalg.distributed.MatrixEntry import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix import org.apache.spark.SparkContext import org.apache.spark.mllib.linalg.distributed.IndexedRow import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.SparkConfobject BlockMatrixDemo extends App {val sparkConf = new SparkConf().setAppName("BlockMatrixDemo").setMaster("spark://sparkmaster:7077") //這里指在本地運行，2個線程 val sc = new SparkContext(sparkConf)implicit def double2long(x:Double)=x.toLongval rdd1= sc.parallelize(Array(Array(1.0,20.0,30.0,40.0),Array(2.0,50.0,60.0,70.0),Array(3.0,80.0,90.0,100.0))).map(f => IndexedRow(f.take(1)(0),Vectors.dense(f.drop(1))))val indexRowMatrix = new IndexedRowMatrix(rdd1)//將IndexedRowMatrix轉換成BlockMatrix，指定每塊的行列數val blockMatrix:BlockMatrix=indexRowMatrix.toBlockMatrix(2, 2)//執行后的打印內容：//Index:(0,0)MatrixContent:2 x 2 CSCMatrix//(1,0) 20.0//(1,1) 30.0//Index:(1,1)MatrixContent:2 x 1 CSCMatrix//(0,0) 70.0//(1,0) 100.0//Index:(1,0)MatrixContent:2 x 2 CSCMatrix//(0,0) 50.0//(1,0) 80.0//(0,1) 60.0//(1,1) 90.0//Index:(0,1)MatrixContent:2 x 1 CSCMatrix//(1,0) 40.0//從打印內容可以看出：各分塊矩陣采用的是稀疏矩陣CSC格式存儲blockMatrix.blocks.foreach(f=>println("Index:"+f._1+"MatrixContent:"+f._2))//轉換成本地矩陣//0.0 0.0 0.0 //20.0 30.0 40.0 //50.0 60.0 70.0 //80.0 90.0 100.0 //從轉換后的內容可以看出，在indexRowMatrix.toBlockMatrix(2, 2)//操作時，指定行列數與實際矩陣內容不匹配時，會進行相應的零值填充blockMatrix.toLocalMatrix()//塊矩陣相加blockMatrix.add(blockMatrix)//塊矩陣相乘blockMatrix*blockMatrix^T（T表示轉置）blockMatrix.multiply(blockMatrix.transpose)//轉換成CoordinateMatrixblockMatrix.toCoordinateMatrix()//轉換成IndexedRowMatrixblockMatrix.toIndexedRowMatrix()//驗證分塊矩陣的合法性blockMatrix.validate() }

總結

以上是生活随笔為你收集整理的Machine Learning On Spark——基础数据结构（二)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Machine Learning On
下一篇： Machine Learning on