日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > python >内容正文

python

python数据分析入门

發(fā)布時(shí)間:2023/12/15 python 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python数据分析入门 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

python數(shù)據(jù)分析入門,作為入門文章系列主要包含以下幾個(gè)內(nèi)容:

1.數(shù)據(jù)的來源(本案例采用的數(shù)據(jù)來自于上一篇文章中爬取的智聯(lián)招聘信息):讀取數(shù)據(jù)庫數(shù)據(jù)、數(shù)據(jù)寫入csv文件、讀取csv文件等

2.數(shù)據(jù)的處理:對(duì)載入到內(nèi)存的數(shù)據(jù)進(jìn)行一系列的操作(處理總共包含清洗、過濾、分組、統(tǒng)計(jì)、排序、計(jì)算平均數(shù)等一系列操作,本文只簡(jiǎn)單涉及了其中某幾個(gè))

3.對(duì)處理后的數(shù)據(jù)進(jìn)行可視化分析等

# !/usr/bin/env python # -*-coding:utf-8-*- """ @Author? : xiaofeng @Time? ? : 2018/12/19 15:23 @Desc : Less interests,More interest.(數(shù)據(jù)分析入門) @Project : python_appliction @FileName: analysis1.py @Software: PyCharm @Blog? ? :https://blog.csdn.net/zwx19921215 """ import pymysql as db import pandas as pd import numpy as np import seaborn as sns import matplotlib as mpl import matplotlib.pyplot as plt# mysql config mysql_config = {'host': '110.0.2.130','user': 'test','password': 'test','database': 'xiaofeng','charset': 'utf8' }""" read data from mysql and write to local file (從數(shù)據(jù)庫中讀取數(shù)據(jù)寫入本地文件) @:param page_no @:param page_size @:param path 文件寫入路徑 """def read_to_csv(page_no, page_size, path):# read from databasedatabase = db.connect(**mysql_config)if page_no > 1:page_no = (page_no - 1) * page_sizeelse:page_no = 0sql = 'select * from zhilian limit ' + str(page_no) + ',' + str(page_size) + ''df = pd.read_sql(sql, database)print(df)database.close()# write to csvlocal = pathdf.to_csv(local, encoding='gbk')""" read data from remote address or local address (讀取文件從遠(yuǎn)程地址或者本地) """def read_from_csv(path):# remote address# data_url = 'https://xxxx/zhilian.csv'# local address# data_url = 'G:/zhilian.csv'df = pd.read_csv(path, encoding='gbk')return df""" 數(shù)據(jù)集的簡(jiǎn)單(過濾、分組、排序)處理 """def simple_op(path):df = read_from_csv(path)# top 10 :獲取前10行print('--------------top 10-----------')top = df.head(10)print(top)# tail 10:獲取后10行print('------------tail 10------------')tail = df.tail(10)print(tail)# filter:根據(jù)指定條件過濾print('-------------filter----------')special_jobid = df[df.JobID == 595950]print(special_jobid)special = df[(df.id >= 110) & (df.PublishDate == '2018-12-18')]print(special)# check :檢查各列缺失情況print('-------------check----------------')check = df.info()print(check)print('--------------data describe-----------')# count,平均數(shù),標(biāo)準(zhǔn)差,中位數(shù),最小值,最大值,25 % 分位數(shù),75 % 分位數(shù)describe = df.describe()print(describe)# 添加新列df2 = df.copy()df2['AnnualSalaryAvg'] = (df['AnnualSalaryMax'] + df['AnnualSalaryMin']) / 2# 重新排列指定列columns = ['JobID', 'JobTitle', 'CompanyName', 'AnnualSalaryMin', 'AnnualSalaryMax', 'AnnualSalaryAvg']df = pd.DataFrame(df2, columns=columns)print(df)""" 可視化分析 """def visualized_analysis(path):df = read_from_csv(path)df_post_count = df.groupby('JobLactionStr')['AnnualSalaryMax'].count().to_frame().reset_index()# subplots(1, 1) 表示1x1個(gè)子圖,figsize=(25, 8) 子圖的寬度和高度# f, [ax1,ax2] = plt.subplots(1, 2, figsize=(25, 8)) 表示1x2個(gè)子圖f, ax2 = plt.subplots(1, 1, figsize=(25, 8))sns.barplot(x='JobLactionStr', y='AnnualSalaryMax', palette='Greens_d', data=df_post_count, ax=ax2)ax2.set_title('各城市職位數(shù)量對(duì)比', fontsize=15)ax2.set_xlabel('城市')ax2.set_ylabel('數(shù)量')# 用來正常顯示中文標(biāo)簽plt.rcParams['font.sans-serif'] = ['SimHei']# 用來正常顯示負(fù)號(hào)plt.rcParams['axes.unicode_minus'] = Falseplt.show()if __name__ == '__main__':path = 'G:/zhilian.csv'read_to_csv(0, 100, path)df = read_from_csv(path)simple_op(path)visualized_analysis(path)

控制臺(tái)輸出如下:

?

寫入本地csv文件效果如下:

?

可視化效果如下:

?

注:由于是入門文章第一篇,所以并沒有對(duì)數(shù)據(jù)分析做過深的探索,更深層次的研究將會(huì)在后續(xù)系列中呈現(xiàn)!

總結(jié)

以上是生活随笔為你收集整理的python数据分析入门的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。