scipy csr_matrix csc_matrix
生活随笔
收集整理的這篇文章主要介紹了
scipy csr_matrix csc_matrix
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
202113224
data_cat = sparse.hstack((data_cat, data_))
橫向合并
df_tfidf = pd.DataFrame(data_cat.toarray())
csr轉成dataframe
20210222
csr_matrix 的數據格式
上面那種數據格式改成下面這種形式
下面的零為行數 第幾行
0,1,2,3 是索引
第三列是具體的值
概述
在用python進行科學運算時,常常需要把一個稀疏的np.array壓縮,這時候就用到scipy庫中的sparse.csr_matrix(csr:Compressed Sparse Row marix) 和sparse.csc_matric(csc:Compressed Sparse Column marix)
scipy.sparse.csr_matrix
官方API介紹(省略前幾種容易理解的了)
csr_matrix((data, indices, indptr), [shape=(M, N)])
is the standard CSR representation where the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]. If the shape parameter is not supplied, the matrix dimensions are inferred from the index arrays.
# 示例解讀
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],[0, 0, 3],[4, 5, 6]])
# 按row行來壓縮
# 對于第i行,非0數據列是indices[indptr[i]:indptr[i+1]] 數據是data[indptr[i]:indptr[i+1]]
# 在本例中
# 第0行,有非0的數據列是indices[indptr[0]:indptr[1]] = indices[0:2] = [0,2]
# 數據是data[indptr[0]:indptr[1]] = data[0:2] = [1,2],所以在第0行第0列是1,第2列是2
# 第1行,有非0的數據列是indices[indptr[1]:indptr[2]] = indices[2:3] = [2]
# 數據是data[indptr[1]:indptr[2] = data[2:3] = [3],所以在第1行第2列是3
# 第2行,有非0的數據列是indices[indptr[2]:indptr[3]] = indices[3:6] = [0,1,2]
# 數據是data[indptr[2]:indptr[3]] = data[3:6] = [4,5,6],所以在第2行第0列是4,第1列是5,第2列是6
scipy.sparse.csc_matrix
官方API介紹(省略前幾種容易理解的了)
csc_matrix((data, indices, indptr), [shape=(M, N)])
is the standard CSC representation where the row indices for column i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]. If the shape parameter is not supplied, the matrix dimensions are inferred from the index arrays.
# 示例解讀
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csc_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 4],[0, 0, 5],[2, 3, 6]])
# 按col列來壓縮
# 對于第i列,非0數據行是indices[indptr[i]:indptr[i+1]] 數據是data[indptr[i]:indptr[i+1]]
# 在本例中
# 第0列,有非0的數據行是indices[indptr[0]:indptr[1]] = indices[0:2] = [0,2]
# 數據是data[indptr[0]:indptr[1]] = data[0:2] = [1,2],所以在第0列第0行是1,第2行是2
# 第1行,有非0的數據行是indices[indptr[1]:indptr[2]] = indices[2:3] = [2]
# 數據是data[indptr[1]:indptr[2] = data[2:3] = [3],所以在第1列第2行是3
# 第2行,有非0的數據行是indices[indptr[2]:indptr[3]] = indices[3:6] = [0,1,2]
# 數據是data[indptr[2]:indptr[3]] = data[3:6] = [4,5,6],所以在第2列第0行是4,第1行是5,第2行是6
總結
以上是生活随笔為你收集整理的scipy csr_matrix csc_matrix的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Python 属性__getattrib
- 下一篇: 获取当前脚本目录路径问题汇总