日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 运维知识 > 数据库 >内容正文

数据库

sqoop mysql hadoop_使用sqoop将mysql数据导入到hadoop

發布時間:2025/3/12 数据库 45 豆豆
生活随笔 收集整理的這篇文章主要介紹了 sqoop mysql hadoop_使用sqoop将mysql数据导入到hadoop 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

hadoop的安裝配置這里就不講了。

Sqoop的安裝也很簡單。

完成sqoop的安裝后,可以這樣測試是否可以連接到mysql(注意:mysql的jar包要放到 SQOOP_HOME/lib 下):

sqoop list-databases --connect jdbc:mysql://192.168.1.109:3306/ --username root --password 19891231

結果如下

即說明sqoop已經可以正常使用了。

下面,要將mysql中的數據導入到hadoop中。

我準備的是一個300萬條數據的身份證數據表:

先啟動hive(使用命令行:hive 即可啟動)

然后使用sqoop導入數據到hive:

sqoop import --connect jdbc:mysql://192.168.1.109:3306/hadoop --username root --password 19891231 --table test_sfz --hive-import

sqoop 會啟動job來完成導入工作。

完成導入用了2分20秒,還是不錯的。

在hive中可以看到剛剛導入的數據表:

我們來一句sql測試一下數據:

select * from test_sfz where id < 10;

可以看到,hive完成這個任務用了將近25秒,確實是挺慢的(在mysql中幾乎是不費時間),但是要考慮到hive是創建了job在hadoop中跑,時間當然多。

接下來,我們會對這些數據進行復雜查詢的測試:

我機子的配置如下:

hadoop 是運行在虛擬機上的偽分布式,虛擬機OS是ubuntu12.04 64位,配置如下:

TEST 1 計算平均年齡

測試數據:300.8 W

1. 計算廣東的平均年齡

mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz where address like '廣東%';

用時: 0.877s

hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz where address like '廣東%';

用時:25.012s

2. 對每個城市的的平均年齡進行從高到低的排序

mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz GROUP BY address order by ageAvge desc;

用時:2.949s

hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz GROUP BY address order by ageAvge desc;

用時:51.29s

可以看到,在耗時上面,hive的增長速度較mysql慢。

TEST 2

測試數據:1200W

mysql 引擎: MyISAM(為了加快查詢速度)

導入到hive:

1. 計算廣東的平均年齡

mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';

用時: 5.642s

hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';

用時:168.259s

2. 對每個城市的的平均年齡進行從高到低的排序

mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;

用時:11.964s

hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;

用時:311.714s

TEST 3

測試數據:2000W

mysql 引擎: MyISAM(為了加快查詢速度)

導入到hive:

(這次用的時間很短!可能是因為TEST2中的導入時,我的主機在做其他耗資源的工作..)

1. 計算廣東的平均年齡

mysql:select (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';

用時: 6.605s

hive:select (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 where address like '廣東%';

用時:188.206s

2. 對每個城市的的平均年齡進行從高到低的排序

mysql:select address, (sum(year(NOW()) - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;

用時:19.926s

hive:select address, (sum(year('2014-10-01') - SUBSTRING(borth,1,4))/count(*)) as ageAvge from test_sfz2 GROUP BY address order by ageAvge desc;

用時:411.816s

總結

以上是生活随笔為你收集整理的sqoop mysql hadoop_使用sqoop将mysql数据导入到hadoop的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。