當前位置：首頁 > 编程语言 > python >内容正文

python

python路径分隔符_Python:当读取一个没有默认分隔符的文件(包含数百万条记录)并将其放入dataframe (pa-问答-阿里云开发者社区-阿里云...

發(fā)布時間：2024/9/30 python 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 python路径分隔符_Python:当读取一个没有默认分隔符的文件(包含数百万条记录)并将其放入dataframe (pa-问答-阿里云开发者社区-阿里云... 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Python:在沒有默認分隔符(包含數(shù)百萬條記錄)的情況下讀取文件并將其放入“數(shù)據(jù)框架(panda)”中，最有效的方法是什么? 文件是:"file_sd.txt"

A123456MESTUDIANTE 000-12

A123457MPROFESOR 003103

I128734MPROGRAMADOR00-111

A129863FARQUITECTO 00-456

# Fields and position:

# - Activity Indicator : indAct -> 01 Character

# - Person Code : codPer -> 06 Characters

# - Gender (M / F) : sex -> 01 Character

# - Occupation : occupation -> 11 Characters

# - Amount(User format): amount -> 06 Characters (Convert to Number)

我不確定。這是最好的選擇嗎?

import pandas as pd

import numpy as np

def stoI(cad):

pos = cad.find("-")

if pos < 0: return int(cad)

return int(cad[pos+1:])*-1

#Read Txt

data = pd.read_csv(r'D:\file_sd.txt',header = None)

data_sep = pd.DataFrame(

{

'indAct' :data[0].str.slice(0,1),

'codPer' :data[0].str.slice(1,7),

'sexo' :data[0].str.slice(7,8),

'ocupac' :data[0].str.slice(8,19),

'monto' :np.vectorize(stoI)(data[0].str.slice(19,25))

})

print(data_sep)

indAct codPer sexo ocupac monto

0 A 123456 M ESTUDIANTE -12

1 A 123457 M PROFESOR 3103

2 I 128734 M PROGRAMADOR -111

3 A 129863 F ARQUITECTO -456

這個7百萬行的解決方案。結(jié)果是:

%timeit df_slice()

11.1 s ± 166 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

問題來源StackOverflow 地址：/questions/59383835/python-efficiency-when-reading-a-file-without-a-default-delimiter-with-millions

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。