在hadoop上运行python_hadoop上运行python程序
數據來源:
http://www.nber.org/patents/acite75_99.zip
首先上傳測試數據到hdfs:
[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop
fs -ls /user/root/test-in
Found 5 items
-rw-r--r-- 1
root supergroup
101 2010-10-24 14:39 /user/root/test-in/NOTICE.txt
-rw-r--r-- 1
root supergroup
1366 2010-10-24 14:39 /user/root/test-in/README.txt
-rw-r--r-- 1
root supergroup 264075431 2010-10-24
19:23 /user/root/test-in/cite75_99.txt
-rw-r--r-- 1
root supergroup
22 2010-10-24 14:39 /user/root/test-in/file1.txt
-rw-r--r-- 1
root supergroup
28 2010-10-24 14:39 /user/root/test-in/file2.txt
2創建python程序
1 #!/usr/bin/env
python
2 import sys,
random
3 for line in
sys.stdin:
4
if (random.randint(1,100) <=
int(sys.argv[1])):
5
print line.strip()
[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop
jar contrib/streaming/hadoop-0.19.2-streaming.jar -input
test-in/cite75_99.txt -output testoutput -mapper 'RandomSample.py
10' -file RandomSample.py
[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop fs
-ls /user/root/
Found 4 items
drwxr-xr-x?- root
supergroup?0 2010-10-24 19:25 /user/root/output
drwxr-xr-x?- root
supergroup?0 2010-10-24 19:23 /user/root/test-in
drwxr-xr-x?- root
supergroup?0 2010-10-24 14:41 /user/root/test-out
drwxr-xr-x?- root
supergroup?0 2010-10-24 22:12 /user/root/testoutput
[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop fs
-ls /user/root/testoutput
Found 2 items
drwxr-xr-x?- root
supergroup?0 2010-10-24 22:08 /user/root/testoutput/_logs
-rw-r--r--?1 root
supergroup?28075087 2010-10-24
22:12 /user/root/testoutput/part-00000
[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop fs
-cat /user/root/testoutput/part-00000 | head
3858242,3319261
3858243,3156927
3858243,3681785
3858243,3684611
3858248,3641592
3858253,2331472
3858254,2869143
3858256,3413665
3858262,3557750
3858264,3530488
[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop fs
-ls /user/root/
Found 4 items
drwxr-xr-x?- root
supergroup?0 2010-10-24 19:25 /user/root/output
drwxr-xr-x?- root
supergroup?0 2010-10-24 19:23 /user/root/test-in
drwxr-xr-x?- root
supergroup?0 2010-10-24 14:41 /user/root/test-out
drwxr-xr-x?- root
supergroup?0 2010-10-24 22:12 /user/root/testoutput
[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop fs
-ls /user/root/testoutput
Found 2 items
drwxr-xr-x?- root
supergroup?0 2010-10-24 22:08 /user/root/testoutput/_logs
-rw-r--r--?1 root
supergroup?28075087 2010-10-24
22:12 /user/root/testoutput/part-00000
[root@localhost:/usr/local/hadoop/hadoop-0.19.2]#bin/hadoop fs
-cat /user/root/testoutput/part-00000 | head
3858242,3319261
3858243,3156927
3858243,3681785
3858243,3684611
3858248,3641592
3858253,2331472
3858254,2869143
3858256,3413665
3858262,3557750
3858264,3530488
總結
以上是生活随笔為你收集整理的在hadoop上运行python_hadoop上运行python程序的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ssh mysql环境搭建 myecli
- 下一篇: python scrapy菜鸟教程_sc