R语言数据的排序、转换、汇总
生活随笔
收集整理的這篇文章主要介紹了
R语言数据的排序、转换、汇总
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
R學(xué)習(xí)筆記4_初級
- 數(shù)據(jù)排序
- sort函數(shù)
- rank函數(shù)
- order函數(shù)
- 數(shù)據(jù)轉(zhuǎn)換
- 長寬型數(shù)據(jù)轉(zhuǎn)換
- stack函數(shù)
- tapply函數(shù)
- reshape函數(shù)
- reshape2函數(shù)
- 變量因子化(連續(xù)變量離散化)
- 數(shù)據(jù)匯總
- apply家族
- apply函數(shù)
- lapply函數(shù)
- sapply函數(shù)
- tapply函數(shù)
- mapply函數(shù)
- ave函數(shù)
- by函數(shù)
- aggregate函數(shù)
- sweep函數(shù)
本系列為R語言系統(tǒng)學(xué)習(xí)筆記,已收錄至“R語言筆記”專欄,可戳右下角專欄目錄訂閱,空余時間會持續(xù)更新。往期文章:
0. R的下載與安裝 vs Rstudio報錯
1. R語言向量、矩陣、數(shù)組、數(shù)據(jù)框
2. R語言條件、循環(huán)、函數(shù)
3. R語言數(shù)據(jù)的讀取與導(dǎo)出
數(shù)據(jù)排序
sort函數(shù)
x <- sample(1:100,10) sort(x, decreasing = T) #默認是從小到大排序 y <- c('python','ruby','java','r') sort(y,decreasing = T) #[1] "ruby" "r" "python" "java"rank函數(shù)
rank(x) #秩次排序,生成變量的秩次排名 z <- c(1,2,3,3,4,4,5,6,6,6,7,8,8) rank(z) #結(jié)果出現(xiàn)了小數(shù),當rank識別到相同的元素,會取元素秩次均值order函數(shù)
x #[1] 39 97 31 83 56 1 19 60 50 41 order(x) #[1] 6 7 3 1 10 9 5 8 4 2 返回的是元素下標 x[order(x)] #下標再傳入x即可生成排序 [1] 1 19 31 39 41 50 56 60 83 97 head(iris) head(iris[order(iris$Sepal.Length,decreasing = T),]) #按Sepal.Length列從大到小的順序排列 head(iris[order(-iris$Sepal.Length),]) #加負號也可按從大到小的順序排列 #對多個變量進行排序 head(iris[order(iris$Sepal.Length, iris$Sepal.Width),]) #先對Sepal.Length排序(從小到大),在此基礎(chǔ)上對Sepal.Width排序數(shù)據(jù)轉(zhuǎn)換
長寬型數(shù)據(jù)轉(zhuǎn)換
stack函數(shù)
freshmen <- c(178,180,182,180) sophomores <- c(188,172,175,172) juniors <- c(167,172,177,174) data.frame(fr = freshmen, so = sophomores, ju = juniors) #結(jié)果如下 #結(jié)果如下,此時是寬型數(shù)據(jù) '''fr so ju 1 178 188 167 2 180 172 172 3 182 175 177 4 180 172 174 ''' height <- stack(list(fresh = freshmen, sopho = sophomores, jun = juniors)) height #運用stack函數(shù),將原本的數(shù)據(jù)堆棧為長型數(shù)據(jù) #結(jié)果如下 '''values ind 1 178 fresh 2 180 fresh 3 182 fresh 4 180 fresh 5 172 fresh 6 188 sopho 7 172 sopho 8 175 sopho 9 172 sopho 10 167 jun 11 172 jun 12 177 jun 13 174 jun '''tapply函數(shù)
tapply(height$values, height$ind, mean) #轉(zhuǎn)換為長型數(shù)據(jù)后,可直接用tapply函數(shù)求各自統(tǒng)計量 #用tapply求統(tǒng)計量 ''' fresh sopho jun 178.40 176.75 172.50 '''reshape函數(shù)
View(Indometh) #Indometh是一個長型數(shù)據(jù) summary(Indometh) #輸出Indometh的描述統(tǒng)計量 wide <- reshape(Indometh, v.names = 'conc', idvar = 'Subject', timevar = 'time', direction = 'wide') #v.names將哪個變量作為value,idvar指標識變量是哪個,timevar作為列 View(wide) long <- reshape(wide, v.names = 'conc', idvar = 'Subject', varying = list(2:12), direction = 'long') #varying = list(2:12)將2到12列堆棧到一起 View(long)reshape2函數(shù)
- 加載reshape2包
- 使用melt函數(shù)
- 計算指標均值
- 實例:小費tips數(shù)據(jù)集
變量因子化(連續(xù)變量離散化)
age <- sample(20:80, 20) age- 方法一 (公式法,True=1,False=0)
- 方法二(cut函數(shù))
- 方法三(if else)
- 方法四(car)
數(shù)據(jù)匯總
apply家族
apply函數(shù)
mat <- matrix(1:24, nrow = 4, ncol = 6) apply(mat, 1, sum) #計算行和 #注釋:第二個指標為margin,1代表行,2代表列 apply(mat, 1, mean) #計算每一行均值 apply(mat, 2, mean) #計算每一列均值 apply(iris[,1:4],2,mean) #計算iris中1:4列均值lapply函數(shù)
lapply(X = c(1:5),FUN = log) #對X遍歷,都返回log lapply(iris[,1:3], function(x)lm(x~iris$Sepal.Width,data = iris[,1:3]))#iris數(shù)據(jù)集前三列與Sepal.Width列進行回歸- lapply用于以list為結(jié)果返回的函數(shù),適用于返回線性回歸結(jié)果。
sapply函數(shù)
sapply(1:5,log) #返回向量、矩陣、數(shù)據(jù)框 sapply(1:5,function(x)x+3)tapply函數(shù)
tapply(X = iris$Sepal.Length,INDEX = iris$Species, FUN = mean)#根據(jù)Species將Sepal.Length切分,分別計算均值- tapply只適用于數(shù)據(jù)框
mapply函數(shù)
myfun <- function(x,y){if(x>4)return(y)else return(x+y) } myfun(1:5, 2:6) #此時會報錯:Error in if (x > 4) return(y) else return(x + y) : the condition has length > 1,因為if無法進行向量化操作 mapply(myfun, 1:5,2:6) #而使用mapply就具有向量化操作功能ave函數(shù)
survival <- data.frame(id = 1:10, cancer = sample(c('lung','liver','colon'),10,replace = T),treatment = sample(c('Surg','Chemo'),10,replace = T),sur_days = sample(100:1000,10)) survival ave(survival$sur_days,survival$cancer)#求不同分類水平(cancer)的(sur_days)均值 ave(survival$sur_days,survival$cancer,FUN = sd)#求不同分類水平(cancer)的(sur_days)標準差by函數(shù)
by(data = survival$sur_days,INDICES = survival$cancer,FUN = mean)#求不同分類水平的均值(簡潔) by(data = survival$sur_days,INDICES = list(survival$cancer,survival$treatment),FUN = mean)aggregate函數(shù)
data(mtcars) View(mtcars) aggregate(x = mtcars,by = list(VS = mtcars$vs==1, high = mtcars$mpg > 22),mean) aggregate(x = mtcars[,1:4],by = list(VS = mtcars$vs==1, high = mtcars$mpg > 22),mean) aggregate(iris,by = list(high_sp = iris$Sepal.Length>5,hige_sw = iris$Sepal.Width > 3.5),mean) #字符串類會警告,數(shù)值型可直接求出 aggregate(.~Species,data = iris,mean)by(mtcars,mtcars$cyl,function(x)lm(mpg~disp + hp, data = x))#自定義函數(shù)進行線性回歸sweep函數(shù)
#針對數(shù)組 my_array <- array(1:24, dim = c(3,4,2)) my_array sweep(x = my_array, MARGIN = 1, STATS = 1, FUN = '+') #對于my_array每行元素+1 #MARGIN取行,默認是減法‘-’運算- cr.Leopard課程
總結(jié)
以上是生活随笔為你收集整理的R语言数据的排序、转换、汇总的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 4g运行内存手机还能用多久_8G运存手机
- 下一篇: 2009年第一天上班,祝大家工作顺利!