sas数据导入终极汇总-之一
生活随笔
收集整理的這篇文章主要介紹了
sas数据导入终极汇总-之一
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
將數據文件讀入SAS ——DATA Step / PROC IMPORT
??1.將SAS文件讀入SAS——
??data sasuser.saslin;
????set "F:\sas1.sas7bdat";
??run;
??proc contents data=sasuser.saslin;
??run;
??2.將其他形式文件導入成SAS ——PROC IMPORT / 直接讀入其他形式文件 ??proc import datafile = "c:\data\hsb2.sav" out= work.hsb2; ??run; ??proc contents data=hsb2; ??run; ??SAS導入數據:SAS recognizes the file type to be imported by file extension.
對數據長度的限制 ???在一些操作環境,SAS假定外部文件的紀錄對最長為256(一行數據包括空格等所有字符在內的長度),如果預計讀入的紀錄長度超過256,可在Infile語句中使用LRECL=n 這個命令。 讀入以空格作為分隔符的原始數據 ???如果原始數據的不同變量之間是以至少一個空格作為分隔符的,那可以直接采用List方法將這些數據讀入SAS。 List Input讀數據非常方便,但也有很多局限性: (1) ??????不能跳過數據; (2) ??????所有的缺失值必須以點代替 (3) ??????字符型數據必須是不包含空格的,且長度不能超過8; (4) ??????不能直接讀入日期型等特殊類型的數據。 程序舉例: INPUT Name $ Age Height;
?????????????????????讀入按列組織的數據 有些原始數據的變量之間沒有空格或其他分隔符,因此這樣的文件不能以List形式對入SAS。但若不同變量值的都在每條記錄的固定位置處,則可以按照Column 形式讀入數據。Colunm讀數據方法要求所有的數據均為字符型或者標準的數值型(數值中僅包括數字,小數點,正負號,或者是E,不包括逗號或日期型數據)。 相對于List方法,Column讀數據方法有如下優點: (1) ??????變量值之間無需用空格分開; (2) ??????可以空格表示缺失值; (3) ??????字符型數據中可包括空格; (4) ??????可跳過數據。 程序舉例: INPUT Name $ 1-10 Age 11-13 Height 14-18; 使用格式命令讀入非標準格式的數據 字符型數據: $informat w. 數值型數據: ??informat w.d 日期型數據: ??Datew. (1)字符型: $CHARw. :不刪除前后空格,讀入字符數據; $HEXw. :將16進制的數據轉化成字符數據; $w. ?????:刪除前面空格,讀入字符數據; (2)日期,時間或日期時間型數據 DATEw. ??????????:以ddmmmyy或ddmmmyyyy形式讀入日期; DATETIMEw. :以ddmmmyy hh:mm:ss.ss 形式讀入日期時間; DDMMYYw. ????:以ddmmyy或ddmmyyyy讀入日期; JULIANw. ???????:以yyddd或yyyyddd讀入Julia日期; MMDDYYw. ????:以mmddyy或mmddyyyy形式讀入日期; TIMEw. ????????????:以hh:mm:ss.ss形式讀入時間; (3)數值型數據 COMMAw.d ??????:讀入數值型數據,將其中的逗號,$ 刪除,并將括號轉化為負號 HEXw. ???????????????:將16進制數據轉化成浮點型數據 IBw.d ?????????????????:讀入整數二進制數據; PERCENTw. ????:將百分數轉化為普通數據; w.d ?????????????????????:讀入標準的數值型數據。
INPUT Name $16. Age 3. +1 Type $1. +1 Date MMDDYY10.? ????????(Score1 Score2 Score3 Score4 Score5) (4.1); ????????????????????????多種輸入格式綜合 讀入位置控制——列指針 +n –n :控制列指針從當前位置向前或向后移動n個字符; @n ??:控制列指針指向 舉例: INPUT ParkName $ 1-22 State $ Year @40 Acreage COMMA9.;
????????????????????????讀入雜亂數據 在不確定從哪一列開始讀入數據,但知道讀入的數據均位于某一特定字符或字符串之后時,可采用@’character’列指針。 如:有字符串如下,需讀入Breed:后面的字符串 My dog Sam Breed: Rottweiler Vet Bills: $478
(1)SAS 語句:Input @’Breed: ’ DogBreed $; ????讀入內容: Rottweil 讀入Breed:后面的字符串,賦給DogBreed,讀入時到空格時,自動結束。 (2)SAS 語句:Input @’Breed:’ DogBreed $20.; ??讀入內容:Rottweiler Vet Bill 讀入Breed: 后面的字符串,賦給DogBreed,讀入字符串的長度為20。 (3)SAS語句:Input @’Breed:’ DogBreed :$20.; ??讀入內容:Rottweiler ?????讀入Breed: 后面的字符串,賦給DogBreed,讀入字符串的長度為20,但遇到空格時不再繼續讀數據。 從原始數據中讀入多行數據作為SAS的一條觀測 使用行指針: ‘ / ’—— 到下一行讀數據 ‘#n ’——到第n 行讀數據 INPUT City $ State $ / NormalHigh NormalLow #3 RecordHigh RecordLow; 從一行原始數據中讀入多個觀測 在Input語句末尾使用@@標示,告訴SAS繼續讀入本行后面的數據。 INPUT City $ State $ NormalRain MeanDaysRain @@; 有選擇的讀入原始數據 SAS讓用戶無需讀入所有的原始數據,然后再判斷是否是用戶所需要的數據。用戶僅需先讀入足夠的變量,以判斷該條觀測是否為自己所需,然后在INPUT語句后以@結尾,以使SAS讀數據的指針停在此處,保留此行數據。然后用戶使用IF語句判斷讀入的觀測是否為所需數據,若是,則使用第二個INPUT語句繼續讀入其余數據。 INPUT Type $ @;? ??IF Type = ’surface’ THEN DELETE;? ??INPUT Name $ 9-38 AMTraffic PMTraffic; @ & @@ (1) ???均為鎖定數據行的標示; (2) ???@標示在SAS進入下個循環之前就釋放鎖定的數據行,而@@標示在繼續鎖定數據行 在INFILE語句中控制輸入的選項 (1)FIRSTOBS=n : 從n條觀測開始讀入數據 (2)OBS=n 讀入n條觀測 (3)當讀進內存的觀測長度小于INPUT語句設定的長度時 當SAS指針到達一條記錄的末尾,而在INPUT語句中尚有未讀入的變量時,SAS默認繼續讀入下一行數據。 MISSOVER:不讀入下一行數據,而將未賦值的變量以缺失值填充。? TRUNCOVER:當使用column或格式化讀入方式時,某些數據行長度小于其他數據行長度時,使用TRUNCOVER選項,可防止SAS讀入下一行數據。 使用DATA步讀入分隔符文件 在INFILE語句中使用DLM= 選項或者DSD選項可以讀入以特定符號作為分隔符的原始文件。 (1)The DLM= option (i.e. DLM=’&’) 如果是以Tab作為分隔符,則使用DLM=’09’X命令 (2)The DSD option:主要有三個功能 忽略單引號內的分隔符; 不將引號作為數據讀入SAS; 將一行內連續兩個單引號作為一個缺失值處理。 使用IMPORT程序步讀入分隔符文件 IMPORT 程序的功能 (1) ??????自動掃描數據文件,并確定變量的類型(數值型或字符型); (2) ??????為字符型變量,自動設定變量的長度; (3) ??????識別一些日期型數據; (4) ??????將兩個連續的分隔符作為一個缺失值讀入SAS (5) ??????讀入引號內數據 (6) ??????自動將原始數據中不存在的變量賦缺失值; PROC IMPORT DATAFILE=’filename’ OUT=data-set; SAS根據讀入文件的擴展名確定文件的類型。若讀入文件沒有正確的擴展名,或者是DLM文件,用戶必須在IMPORT程序步中使用DBMS=option 選項。當讀入數據集的名稱已經存在于SAS庫中,可用REPLACE選項將原數據覆蓋。 PROC IMPORT DATAFILE=’filename’ OUT=data-set DBMS=identifier REPLACE; 在默認情況下,IMPORT程序步將第一行數據作為變量的名稱。若第一行數據并非變量名,可在IMPORT語句后使用GETNAMES=NO語句。 若IMPORT程序讀入的是分隔符文件,默認分隔符為空格。若不是,則需使用DILIMITER=statement語句指定分隔符。 PROC IMPORT DATAFILE=’filename’ OUT=data-set? ???????????????????????????DBMS=DLM REPLACE; ??????GETNAMES=NO; ??????DELIMITER=’delimiter-character’; RUN; 使用IMPORT程序步讀入PC文件 PROC IMPORT DATAFILE=’filename’ OUT=data-set ??????DBMS=identifier REPLACE; 列示SAS數據集的內容 PROC CONTENTS DATA=data-set; CONTENTS程序步的功能是顯示SAS對數據集的具體描述,主要內容有: (1) ??????數據集描述 ??????????????數據集的名稱; ??????????????觀測的數量; ??????????????變量的數量; ??????????????創建日期 (2) ??????變量描述 ??????????????變量類型; ??????????????變量長度; ??????????????變量的輸出格式; ??????????????變更的輸入格式; ?????????????變量標識。
實例: 1.讀入逗號分隔數據:cars_novname.csv
Acura,MDX,SUV,Asia,All,"$36,945 ","$33,337 ",3.5,6,265,17,23,4451,106,189 Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820 ","$21,761 ",2,4,200,24,31,2778,101,172 Acura,TSX 4dr,Sedan,Asia,Front,"$26,990 ","$24,647 ",2.4,4,200,22,29,3230,105,183 Acura,TL 4dr,Sedan,Asia,Front,"$33,195 ","$30,299 ",3.2,6,270,20,28,3575,108,186 Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755 ","$39,014 ",3.5,6,225,18,24,3880,115,197
proc import datafile="cars_novname.csv" out=mydata dbms=csv replace; ???getnames=no; run;
proc contents data=mydata; run;
SAS creates default variable names as VAR1-VARn when variables names are not present in the raw data file.
2.讀入制表鍵分隔的數據: proc import datafile="cars.txt" out=mydata dbms=tab replace; ??getnames=no; run; 3.根據不同任務將不同的數據集永久保存到對應任務的文件夾下: libname dis "c:\dissertation"; proc import datafile="cars.txt" out=dis.mydata dbms=dlm replace; ??delimiter='09'x; ??getnames=yes; run; 3.讀入空格鍵分隔的數據: proc import datafile="cars_sp.txt" out=mydata dbms=dlm replace; getnames=no; run; 4.分隔符的終極例子: Other kinds of delimiters You can use delimiter= on the infile statement to tell SAS what delimiter you are using to separate variables in your raw data file. For example, below we have a raw data file that uses exclamation points ! to separate the variables in the file.
22!2930!4099 17!3350!4749 22!2640!3799 20!3250!4816 15!4080!7827 The example below shows how to read this file by using delimiter='!' on the infile statement.
DATA cars; INFILE 'readdel1.txt' DELIMITER='!' ; INPUT mpg weight price; RUN;
PROC PRINT DATA=cars; RUN; As you can see in the output below, the data was read properly.
OBS ???MPG ???WEIGHT ???PRICE
1 ?????22 ????2930 ?????4099 2 ?????17 ????3350 ?????4749 3 ?????22 ????2640 ?????3799 4 ?????20 ????3250 ?????4816 5 ?????15 ????4080 ?????7827 It is possible to use multiple delimiters. The example file below uses either exclamation points or plus signs as delimiters.
22!2930!4099 17+3350+4749 22!2640!3799 20+3250+4816 15+4080!7827 By using delimiter='!+' on the infile statement, SAS will recognize both of these as valid delimiters.
DATA cars; INFILE 'readdel2.txt' DELIMITER='!+' ; INPUT mpg weight price; RUN;
PROC PRINT DATA=cars; RUN; As you can see in the output below, the data was read properly.
OBS ???MPG ???WEIGHT ???PRICE
1 ?????22 ????2930 ?????4099 2 ?????17 ????3350 ?????4749 3 ?????22 ????2640 ?????3799 4 ?????20 ????3250 ?????4816 5 ?????15 ????4080 ?????7827
import缺陷及注意事項: Proc import does not know the formats for your variables, but it is able to guess the format based on what the beginning of your dataset looks like. Most of the time, this guess is fine. But if the length of a variable differs from beginning to end of your file, you might end up with some truncated values.
重點語法-Infile options For more complicated file layouts, refer to the infile options described below.
DLM= The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. For example, dlm=','indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm='09'x indicates that tabs are used to separate your variables (e.g., a tab separated file).
DSD? The dsd option has 2 functions. First, it recognizes two consecutive delimiters as a missing value. For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the the dsd option SAS will treat it as 20 30 . 50 , which is probably what you intended. Second, it allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.
FIRSTOBS= This option tells SAS what on what line you want it to start reading your raw data file. If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. For example, if you are reading a comma separated file or a tab separated file that has the variable names on the first line, then use firstobs=2 to tell SAS to begin reading at the second line (so it will ignore the first line with the names of the variables).
MISSOVER? This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. If your data is supposed to only have one observation for each line of raw data, then this could cause errors throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file.
OBS=? Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program. When you want to read the entire file, you can remove the obs= option entirely.
A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be:
INFILE "test.txt" DLM=',' DSD MISSOVER FIRSTOBS=2 ;
讀入有缺失值的數據或者讀入數值中含有分隔符的數據 DATA cars2; length make $ 20 ; INFILE 'readdsd.txt' DELIMITER=',' DSD ; INPUT make mpg weight price; RUN;
PROC PRINT DATA=cars2; RUN;
48,'Bill Clinton',210 50,'George Bush, Jr.',180 DATA guys2; length name $ 20 ; INFILE 'readdsd2.txt' DELIMITER=',' DSD ; INPUT age name weight ; RUN;
PROC PRINT DATA=guys2; RUN;
最經典例子:從某行開始讀入數據 DATA cars2; length nf 8; INFILE 'F:\cars1.csv' DELIMITER=',' dsd MISSOVER firstobs=2 ; INPUT nf zh hh xb cs IHA fj; RUN;
PROC PRINT DATA=cars2; RUN; 創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎
??2.將其他形式文件導入成SAS ——PROC IMPORT / 直接讀入其他形式文件 ??proc import datafile = "c:\data\hsb2.sav" out= work.hsb2; ??run; ??proc contents data=hsb2; ??run; ??SAS導入數據:SAS recognizes the file type to be imported by file extension.
對數據長度的限制 ???在一些操作環境,SAS假定外部文件的紀錄對最長為256(一行數據包括空格等所有字符在內的長度),如果預計讀入的紀錄長度超過256,可在Infile語句中使用LRECL=n 這個命令。 讀入以空格作為分隔符的原始數據 ???如果原始數據的不同變量之間是以至少一個空格作為分隔符的,那可以直接采用List方法將這些數據讀入SAS。 List Input讀數據非常方便,但也有很多局限性: (1) ??????不能跳過數據; (2) ??????所有的缺失值必須以點代替 (3) ??????字符型數據必須是不包含空格的,且長度不能超過8; (4) ??????不能直接讀入日期型等特殊類型的數據。 程序舉例: INPUT Name $ Age Height;
?????????????????????讀入按列組織的數據 有些原始數據的變量之間沒有空格或其他分隔符,因此這樣的文件不能以List形式對入SAS。但若不同變量值的都在每條記錄的固定位置處,則可以按照Column 形式讀入數據。Colunm讀數據方法要求所有的數據均為字符型或者標準的數值型(數值中僅包括數字,小數點,正負號,或者是E,不包括逗號或日期型數據)。 相對于List方法,Column讀數據方法有如下優點: (1) ??????變量值之間無需用空格分開; (2) ??????可以空格表示缺失值; (3) ??????字符型數據中可包括空格; (4) ??????可跳過數據。 程序舉例: INPUT Name $ 1-10 Age 11-13 Height 14-18; 使用格式命令讀入非標準格式的數據 字符型數據: $informat w. 數值型數據: ??informat w.d 日期型數據: ??Datew. (1)字符型: $CHARw. :不刪除前后空格,讀入字符數據; $HEXw. :將16進制的數據轉化成字符數據; $w. ?????:刪除前面空格,讀入字符數據; (2)日期,時間或日期時間型數據 DATEw. ??????????:以ddmmmyy或ddmmmyyyy形式讀入日期; DATETIMEw. :以ddmmmyy hh:mm:ss.ss 形式讀入日期時間; DDMMYYw. ????:以ddmmyy或ddmmyyyy讀入日期; JULIANw. ???????:以yyddd或yyyyddd讀入Julia日期; MMDDYYw. ????:以mmddyy或mmddyyyy形式讀入日期; TIMEw. ????????????:以hh:mm:ss.ss形式讀入時間; (3)數值型數據 COMMAw.d ??????:讀入數值型數據,將其中的逗號,$ 刪除,并將括號轉化為負號 HEXw. ???????????????:將16進制數據轉化成浮點型數據 IBw.d ?????????????????:讀入整數二進制數據; PERCENTw. ????:將百分數轉化為普通數據; w.d ?????????????????????:讀入標準的數值型數據。
INPUT Name $16. Age 3. +1 Type $1. +1 Date MMDDYY10.? ????????(Score1 Score2 Score3 Score4 Score5) (4.1); ????????????????????????多種輸入格式綜合 讀入位置控制——列指針 +n –n :控制列指針從當前位置向前或向后移動n個字符; @n ??:控制列指針指向 舉例: INPUT ParkName $ 1-22 State $ Year @40 Acreage COMMA9.;
????????????????????????讀入雜亂數據 在不確定從哪一列開始讀入數據,但知道讀入的數據均位于某一特定字符或字符串之后時,可采用@’character’列指針。 如:有字符串如下,需讀入Breed:后面的字符串 My dog Sam Breed: Rottweiler Vet Bills: $478
(1)SAS 語句:Input @’Breed: ’ DogBreed $; ????讀入內容: Rottweil 讀入Breed:后面的字符串,賦給DogBreed,讀入時到空格時,自動結束。 (2)SAS 語句:Input @’Breed:’ DogBreed $20.; ??讀入內容:Rottweiler Vet Bill 讀入Breed: 后面的字符串,賦給DogBreed,讀入字符串的長度為20。 (3)SAS語句:Input @’Breed:’ DogBreed :$20.; ??讀入內容:Rottweiler ?????讀入Breed: 后面的字符串,賦給DogBreed,讀入字符串的長度為20,但遇到空格時不再繼續讀數據。 從原始數據中讀入多行數據作為SAS的一條觀測 使用行指針: ‘ / ’—— 到下一行讀數據 ‘#n ’——到第n 行讀數據 INPUT City $ State $ / NormalHigh NormalLow #3 RecordHigh RecordLow; 從一行原始數據中讀入多個觀測 在Input語句末尾使用@@標示,告訴SAS繼續讀入本行后面的數據。 INPUT City $ State $ NormalRain MeanDaysRain @@; 有選擇的讀入原始數據 SAS讓用戶無需讀入所有的原始數據,然后再判斷是否是用戶所需要的數據。用戶僅需先讀入足夠的變量,以判斷該條觀測是否為自己所需,然后在INPUT語句后以@結尾,以使SAS讀數據的指針停在此處,保留此行數據。然后用戶使用IF語句判斷讀入的觀測是否為所需數據,若是,則使用第二個INPUT語句繼續讀入其余數據。 INPUT Type $ @;? ??IF Type = ’surface’ THEN DELETE;? ??INPUT Name $ 9-38 AMTraffic PMTraffic; @ & @@ (1) ???均為鎖定數據行的標示; (2) ???@標示在SAS進入下個循環之前就釋放鎖定的數據行,而@@標示在繼續鎖定數據行 在INFILE語句中控制輸入的選項 (1)FIRSTOBS=n : 從n條觀測開始讀入數據 (2)OBS=n 讀入n條觀測 (3)當讀進內存的觀測長度小于INPUT語句設定的長度時 當SAS指針到達一條記錄的末尾,而在INPUT語句中尚有未讀入的變量時,SAS默認繼續讀入下一行數據。 MISSOVER:不讀入下一行數據,而將未賦值的變量以缺失值填充。? TRUNCOVER:當使用column或格式化讀入方式時,某些數據行長度小于其他數據行長度時,使用TRUNCOVER選項,可防止SAS讀入下一行數據。 使用DATA步讀入分隔符文件 在INFILE語句中使用DLM= 選項或者DSD選項可以讀入以特定符號作為分隔符的原始文件。 (1)The DLM= option (i.e. DLM=’&’) 如果是以Tab作為分隔符,則使用DLM=’09’X命令 (2)The DSD option:主要有三個功能 忽略單引號內的分隔符; 不將引號作為數據讀入SAS; 將一行內連續兩個單引號作為一個缺失值處理。 使用IMPORT程序步讀入分隔符文件 IMPORT 程序的功能 (1) ??????自動掃描數據文件,并確定變量的類型(數值型或字符型); (2) ??????為字符型變量,自動設定變量的長度; (3) ??????識別一些日期型數據; (4) ??????將兩個連續的分隔符作為一個缺失值讀入SAS (5) ??????讀入引號內數據 (6) ??????自動將原始數據中不存在的變量賦缺失值; PROC IMPORT DATAFILE=’filename’ OUT=data-set; SAS根據讀入文件的擴展名確定文件的類型。若讀入文件沒有正確的擴展名,或者是DLM文件,用戶必須在IMPORT程序步中使用DBMS=option 選項。當讀入數據集的名稱已經存在于SAS庫中,可用REPLACE選項將原數據覆蓋。 PROC IMPORT DATAFILE=’filename’ OUT=data-set DBMS=identifier REPLACE; 在默認情況下,IMPORT程序步將第一行數據作為變量的名稱。若第一行數據并非變量名,可在IMPORT語句后使用GETNAMES=NO語句。 若IMPORT程序讀入的是分隔符文件,默認分隔符為空格。若不是,則需使用DILIMITER=statement語句指定分隔符。 PROC IMPORT DATAFILE=’filename’ OUT=data-set? ???????????????????????????DBMS=DLM REPLACE; ??????GETNAMES=NO; ??????DELIMITER=’delimiter-character’; RUN; 使用IMPORT程序步讀入PC文件 PROC IMPORT DATAFILE=’filename’ OUT=data-set ??????DBMS=identifier REPLACE; 列示SAS數據集的內容 PROC CONTENTS DATA=data-set; CONTENTS程序步的功能是顯示SAS對數據集的具體描述,主要內容有: (1) ??????數據集描述 ??????????????數據集的名稱; ??????????????觀測的數量; ??????????????變量的數量; ??????????????創建日期 (2) ??????變量描述 ??????????????變量類型; ??????????????變量長度; ??????????????變量的輸出格式; ??????????????變更的輸入格式; ?????????????變量標識。
實例: 1.讀入逗號分隔數據:cars_novname.csv
Acura,MDX,SUV,Asia,All,"$36,945 ","$33,337 ",3.5,6,265,17,23,4451,106,189 Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820 ","$21,761 ",2,4,200,24,31,2778,101,172 Acura,TSX 4dr,Sedan,Asia,Front,"$26,990 ","$24,647 ",2.4,4,200,22,29,3230,105,183 Acura,TL 4dr,Sedan,Asia,Front,"$33,195 ","$30,299 ",3.2,6,270,20,28,3575,108,186 Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755 ","$39,014 ",3.5,6,225,18,24,3880,115,197
proc import datafile="cars_novname.csv" out=mydata dbms=csv replace; ???getnames=no; run;
proc contents data=mydata; run;
SAS creates default variable names as VAR1-VARn when variables names are not present in the raw data file.
2.讀入制表鍵分隔的數據: proc import datafile="cars.txt" out=mydata dbms=tab replace; ??getnames=no; run; 3.根據不同任務將不同的數據集永久保存到對應任務的文件夾下: libname dis "c:\dissertation"; proc import datafile="cars.txt" out=dis.mydata dbms=dlm replace; ??delimiter='09'x; ??getnames=yes; run; 3.讀入空格鍵分隔的數據: proc import datafile="cars_sp.txt" out=mydata dbms=dlm replace; getnames=no; run; 4.分隔符的終極例子: Other kinds of delimiters You can use delimiter= on the infile statement to tell SAS what delimiter you are using to separate variables in your raw data file. For example, below we have a raw data file that uses exclamation points ! to separate the variables in the file.
22!2930!4099 17!3350!4749 22!2640!3799 20!3250!4816 15!4080!7827 The example below shows how to read this file by using delimiter='!' on the infile statement.
DATA cars; INFILE 'readdel1.txt' DELIMITER='!' ; INPUT mpg weight price; RUN;
PROC PRINT DATA=cars; RUN; As you can see in the output below, the data was read properly.
OBS ???MPG ???WEIGHT ???PRICE
1 ?????22 ????2930 ?????4099 2 ?????17 ????3350 ?????4749 3 ?????22 ????2640 ?????3799 4 ?????20 ????3250 ?????4816 5 ?????15 ????4080 ?????7827 It is possible to use multiple delimiters. The example file below uses either exclamation points or plus signs as delimiters.
22!2930!4099 17+3350+4749 22!2640!3799 20+3250+4816 15+4080!7827 By using delimiter='!+' on the infile statement, SAS will recognize both of these as valid delimiters.
DATA cars; INFILE 'readdel2.txt' DELIMITER='!+' ; INPUT mpg weight price; RUN;
PROC PRINT DATA=cars; RUN; As you can see in the output below, the data was read properly.
OBS ???MPG ???WEIGHT ???PRICE
1 ?????22 ????2930 ?????4099 2 ?????17 ????3350 ?????4749 3 ?????22 ????2640 ?????3799 4 ?????20 ????3250 ?????4816 5 ?????15 ????4080 ?????7827
import缺陷及注意事項: Proc import does not know the formats for your variables, but it is able to guess the format based on what the beginning of your dataset looks like. Most of the time, this guess is fine. But if the length of a variable differs from beginning to end of your file, you might end up with some truncated values.
重點語法-Infile options For more complicated file layouts, refer to the infile options described below.
DLM= The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. For example, dlm=','indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm='09'x indicates that tabs are used to separate your variables (e.g., a tab separated file).
DSD? The dsd option has 2 functions. First, it recognizes two consecutive delimiters as a missing value. For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the the dsd option SAS will treat it as 20 30 . 50 , which is probably what you intended. Second, it allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.
FIRSTOBS= This option tells SAS what on what line you want it to start reading your raw data file. If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. For example, if you are reading a comma separated file or a tab separated file that has the variable names on the first line, then use firstobs=2 to tell SAS to begin reading at the second line (so it will ignore the first line with the names of the variables).
MISSOVER? This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. If your data is supposed to only have one observation for each line of raw data, then this could cause errors throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file.
OBS=? Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program. When you want to read the entire file, you can remove the obs= option entirely.
A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be:
INFILE "test.txt" DLM=',' DSD MISSOVER FIRSTOBS=2 ;
讀入有缺失值的數據或者讀入數值中含有分隔符的數據 DATA cars2; length make $ 20 ; INFILE 'readdsd.txt' DELIMITER=',' DSD ; INPUT make mpg weight price; RUN;
PROC PRINT DATA=cars2; RUN;
48,'Bill Clinton',210 50,'George Bush, Jr.',180 DATA guys2; length name $ 20 ; INFILE 'readdsd2.txt' DELIMITER=',' DSD ; INPUT age name weight ; RUN;
PROC PRINT DATA=guys2; RUN;
最經典例子:從某行開始讀入數據 DATA cars2; length nf 8; INFILE 'F:\cars1.csv' DELIMITER=',' dsd MISSOVER firstobs=2 ; INPUT nf zh hh xb cs IHA fj; RUN;
PROC PRINT DATA=cars2; RUN; 創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎
總結
以上是生活随笔為你收集整理的sas数据导入终极汇总-之一的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: sas数据导入终极汇总-之二
- 下一篇: 一些常用的SAS命令