日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

R语言数据的排序、转换、汇总

發(fā)布時間:2024/3/13 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 R语言数据的排序、转换、汇总 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

R學(xué)習(xí)筆記4_初級

  • 數(shù)據(jù)排序
    • sort函數(shù)
    • rank函數(shù)
    • order函數(shù)
  • 數(shù)據(jù)轉(zhuǎn)換
    • 長寬型數(shù)據(jù)轉(zhuǎn)換
      • stack函數(shù)
      • tapply函數(shù)
      • reshape函數(shù)
      • reshape2函數(shù)
    • 變量因子化(連續(xù)變量離散化)
  • 數(shù)據(jù)匯總
    • apply家族
      • apply函數(shù)
      • lapply函數(shù)
      • sapply函數(shù)
      • tapply函數(shù)
      • mapply函數(shù)
    • ave函數(shù)
    • by函數(shù)
    • aggregate函數(shù)
    • sweep函數(shù)

本系列為R語言系統(tǒng)學(xué)習(xí)筆記,已收錄至“R語言筆記”專欄,可戳右下角專欄目錄訂閱,空余時間會持續(xù)更新。往期文章:
0. R的下載與安裝 vs Rstudio報錯
1. R語言向量、矩陣、數(shù)組、數(shù)據(jù)框
2. R語言條件、循環(huán)、函數(shù)
3. R語言數(shù)據(jù)的讀取與導(dǎo)出

數(shù)據(jù)排序

sort函數(shù)

x <- sample(1:100,10) sort(x, decreasing = T) #默認是從小到大排序 y <- c('python','ruby','java','r') sort(y,decreasing = T) #[1] "ruby" "r" "python" "java"

rank函數(shù)

rank(x) #秩次排序,生成變量的秩次排名 z <- c(1,2,3,3,4,4,5,6,6,6,7,8,8) rank(z) #結(jié)果出現(xiàn)了小數(shù),當rank識別到相同的元素,會取元素秩次均值

order函數(shù)

x #[1] 39 97 31 83 56 1 19 60 50 41 order(x) #[1] 6 7 3 1 10 9 5 8 4 2 返回的是元素下標 x[order(x)] #下標再傳入x即可生成排序 [1] 1 19 31 39 41 50 56 60 83 97 head(iris) head(iris[order(iris$Sepal.Length,decreasing = T),]) #按Sepal.Length列從大到小的順序排列 head(iris[order(-iris$Sepal.Length),]) #加負號也可按從大到小的順序排列 #對多個變量進行排序 head(iris[order(iris$Sepal.Length, iris$Sepal.Width),]) #先對Sepal.Length排序(從小到大),在此基礎(chǔ)上對Sepal.Width排序

數(shù)據(jù)轉(zhuǎn)換

長寬型數(shù)據(jù)轉(zhuǎn)換

stack函數(shù)

freshmen <- c(178,180,182,180) sophomores <- c(188,172,175,172) juniors <- c(167,172,177,174) data.frame(fr = freshmen, so = sophomores, ju = juniors) #結(jié)果如下 #結(jié)果如下,此時是寬型數(shù)據(jù) '''fr so ju 1 178 188 167 2 180 172 172 3 182 175 177 4 180 172 174 ''' height <- stack(list(fresh = freshmen, sopho = sophomores, jun = juniors)) height #運用stack函數(shù),將原本的數(shù)據(jù)堆棧為長型數(shù)據(jù) #結(jié)果如下 '''values ind 1 178 fresh 2 180 fresh 3 182 fresh 4 180 fresh 5 172 fresh 6 188 sopho 7 172 sopho 8 175 sopho 9 172 sopho 10 167 jun 11 172 jun 12 177 jun 13 174 jun '''

tapply函數(shù)

tapply(height$values, height$ind, mean) #轉(zhuǎn)換為長型數(shù)據(jù)后,可直接用tapply函數(shù)求各自統(tǒng)計量 #用tapply求統(tǒng)計量 ''' fresh sopho jun 178.40 176.75 172.50 '''

reshape函數(shù)

View(Indometh) #Indometh是一個長型數(shù)據(jù) summary(Indometh) #輸出Indometh的描述統(tǒng)計量 wide <- reshape(Indometh, v.names = 'conc', idvar = 'Subject', timevar = 'time', direction = 'wide') #v.names將哪個變量作為value,idvar指標識變量是哪個,timevar作為列 View(wide) long <- reshape(wide, v.names = 'conc', idvar = 'Subject', varying = list(2:12), direction = 'long') #varying = list(2:12)將2到12列堆棧到一起 View(long)

reshape2函數(shù)

  • 加載reshape2包
install.packages("reshape2") library(reshape2) #常用melt()融化函數(shù)和dcast()匯總操作函數(shù)
  • 使用melt函數(shù)
new_iris <- melt(data = iris, di.var = 'Species') #將Species列作為標識列,其它的堆起來(長型) View(new_iris) levels(new_iris$variable)
  • 計算指標均值
dcast(new_iris, formula = Species~variable, fun.aggregate = mean, value.var = 'value') #計算指標的均值 #結(jié)果 '''Species Sepal.Length Sepal.Width Petal.Length Petal.Width 1 setosa 5.006 3.428 1.462 0.246 2 versicolor 5.936 2.770 4.260 1.326 3 virginica 6.588 2.974 5.552 2.026 '''
  • 實例:小費tips數(shù)據(jù)集
dcast(data = tips, formula = sex~.,fun.aggregate = mean, value.var = 'tip') #結(jié)果 ''' sex . 1 Female 2.833448 2 Male 3.089618 ''' dcast(data = tips, formula = sex~smoker, fun.aggregate = mean, value.var = 'tip') #性別sex、不抽煙No、抽煙Yes '''sex No Yes 1 Female 2.773519 2.931515 2 Male 3.113402 3.051167 '''

變量因子化(連續(xù)變量離散化)

age <- sample(20:80, 20) age
  • 方法一 (公式法,True=1,False=0)
age1 <- 1 + (age > 30) + (age >= 40) + (age >=50) age_fac <- factor(age1, labels = c('young','middle','m-old','old')) age_fac age2 <- 1*(age < 30) + 2*(age >= 30 & age < 40) + 3*(age >= 40 & age < 50) + 4*(age > 50)
  • 方法二(cut函數(shù))
age3 <- cut(age, breaks = 4, labels = c('young','middle','m-old','old'), include.lowest = T, right = T) #注意此時break = 4 是均勻的分4份,而不是之前的每20歲切一檔 #include.lowest = T 設(shè)置為左閉區(qū)間,設(shè)置為右閉區(qū)間right = T age4 <- cut(age, breaks = seq(20,80,len = 4), labels = c('young','middle','old')) age4 #此時用seq函數(shù)在break中設(shè)置了20、40、60、80為年齡分段的函數(shù)
  • 方法三(if else)
ifelse(age > 50, 'old','young') ifelse(age > 60, 'old', ifelse(age < 30, 'young','middle')) #同Excel中的if函數(shù)
  • 方法四(car)
install.packages('car') library(car) recode(var = age, recodes = 'lo:29 = 1; 30:39 = 2; 40:49 = 3; 50:hi = 4')

數(shù)據(jù)匯總

apply家族

apply函數(shù)

mat <- matrix(1:24, nrow = 4, ncol = 6) apply(mat, 1, sum) #計算行和 #注釋:第二個指標為margin,1代表行,2代表列 apply(mat, 1, mean) #計算每一行均值 apply(mat, 2, mean) #計算每一列均值 apply(iris[,1:4],2,mean) #計算iris中1:4列均值

lapply函數(shù)

lapply(X = c(1:5),FUN = log) #對X遍歷,都返回log lapply(iris[,1:3], function(x)lm(x~iris$Sepal.Width,data = iris[,1:3]))#iris數(shù)據(jù)集前三列與Sepal.Width列進行回歸
  • lapply用于以list為結(jié)果返回的函數(shù),適用于返回線性回歸結(jié)果

sapply函數(shù)

sapply(1:5,log) #返回向量、矩陣、數(shù)據(jù)框 sapply(1:5,function(x)x+3)

tapply函數(shù)

tapply(X = iris$Sepal.Length,INDEX = iris$Species, FUN = mean)#根據(jù)Species將Sepal.Length切分,分別計算均值
  • tapply只適用于數(shù)據(jù)框
tapply(iris[,1:4],INDEX = iris$Species, FUN = mean) #此時報錯×Error in tapply(iris[, 1:4], INDEX = iris$Species, FUN = mean) : 參數(shù)的長度必需相同 #可用上述dcast()函數(shù) #根據(jù)分類變量將數(shù)值型變量進行切分,分類匯總

mapply函數(shù)

myfun <- function(x,y){if(x>4)return(y)else return(x+y) } myfun(1:5, 2:6) #此時會報錯:Error in if (x > 4) return(y) else return(x + y) : the condition has length > 1,因為if無法進行向量化操作 mapply(myfun, 1:5,2:6) #而使用mapply就具有向量化操作功能

ave函數(shù)

survival <- data.frame(id = 1:10, cancer = sample(c('lung','liver','colon'),10,replace = T),treatment = sample(c('Surg','Chemo'),10,replace = T),sur_days = sample(100:1000,10)) survival ave(survival$sur_days,survival$cancer)#求不同分類水平(cancer)的(sur_days)均值 ave(survival$sur_days,survival$cancer,FUN = sd)#求不同分類水平(cancer)的(sur_days)標準差

by函數(shù)

by(data = survival$sur_days,INDICES = survival$cancer,FUN = mean)#求不同分類水平的均值(簡潔) by(data = survival$sur_days,INDICES = list(survival$cancer,survival$treatment),FUN = mean)

aggregate函數(shù)

data(mtcars) View(mtcars) aggregate(x = mtcars,by = list(VS = mtcars$vs==1, high = mtcars$mpg > 22),mean) aggregate(x = mtcars[,1:4],by = list(VS = mtcars$vs==1, high = mtcars$mpg > 22),mean) aggregate(iris,by = list(high_sp = iris$Sepal.Length>5,hige_sw = iris$Sepal.Width > 3.5),mean) #字符串類會警告,數(shù)值型可直接求出 aggregate(.~Species,data = iris,mean)by(mtcars,mtcars$cyl,function(x)lm(mpg~disp + hp, data = x))#自定義函數(shù)進行線性回歸

sweep函數(shù)

#針對數(shù)組 my_array <- array(1:24, dim = c(3,4,2)) my_array sweep(x = my_array, MARGIN = 1, STATS = 1, FUN = '+') #對于my_array每行元素+1 #MARGIN取行,默認是減法‘-’運算
  • cr.Leopard課程

總結(jié)

以上是生活随笔為你收集整理的R语言数据的排序、转换、汇总的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。