hadoop job 数量_Hadoop job任务分配
1. 必要性Hadoop提供了多個配置參數使得admin和user可以靈活設定內存;有些參數有defaut-value, 有些選項是cluster specific以支持memory-intensive作業。
當構建一個cluster時,admin可以先設定一些appropriate default value;其他一些參數設定可根據cluster硬件配置(如任務可獲得的物理內存和虛擬內存的總大小、slave配置的slots的數目、在slave上運行的process的需求)和作業類型(如內存密集型任務)而確定。
2. 內存監控(1) 監控任務內存的目的防止MapReduce task占用了過量的內存(consuming memory beyond a limit),從而導致同在該slave上運行的其他進程、其他任務、或者daemon(例如DataNode或者TaskTracker)。(2) virtual memory和physical memoryHadoop可以監控節點的virtual memory和physical memory,兩者之間獨立。然而,在streaming應用中,由于程序需要加載了libraries來執行任務,故virtual memory使用較多。在這種情況下,監控physical memory會更準確.
(3) hadoop允許為作業指定期望所需內存的最大值。通過resource aware scheduling and monitoring, hadoop tries to確保滿足task數量,以滿足限制(a) an individual job's memory requirement
(b) the total amount of memory available for all MapReduce tasks(4) TaskTracker 對task的監控(a) 周期性的監控第一步:以防某個task及其child process累計使用的virtual memory和physical memory的量不超過specified的量。先查virtual memory, 接著physical memory. 若超過,則kill該task及其child process。并標記該task為failed.
第二步:檢查某個job的所有running tasks及其child processes累計使用的virtual memory和physical memory的量。若超過limit, 則kill以足夠量的task,直到累計內存的使用量低于limit. (若virtual memory超限,則kill掉那些進展最小的tasks;若physical memory超限,則kill掉那些占用physical memory最多的task)。被kill掉的task被標記為killed.(5) Resource aware schedulingResource aware scheduling能確保:要調度task到某個slave上前,先要確保該slave能夠滿足task的memory requirement。
Capacity Scheduling在調度作業時,把virtual memory的需求考慮進去。見
(7) cluster相關的內存配置這些配置與JobTracker和TaskTracker相關,任何job不能修改這些參數。另外,配置參數在每個slave上相同。
mapreduce.cluster.{map|reduce}memory.mb: These options define the default amount of virtual memory that should be allocated for MapReduce tasks running in the cluster. They typically match the default values set for the options mapreduce.{map|reduce}.memory.mb. They help in the calculation of the total amount of virtual memory available for MapReduce tasks on a slave, using the following equation:
Total virtual memory for all MapReduce tasks = (mapreduce.cluster.mapmemory.mb * mapreduce.tasktracker.map.tasks.maximum) + (mapreduce.cluster.reducememory.mb * mapreduce.tasktracker.reduce.tasks.maximum)
Typically, reduce tasks require more memory than map tasks. Hence a higher value is recommended for mapreduce.cluster.reducememory.mb. The value is specified in MB. To set a value of 2GB for reduce tasks, set mapreduce.cluster.reducememory.mb to 2048.
mapreduce.jobtracker.max{map|reduce}memory.mb: These options define the maximum amount of virtual memory that can be requested by jobs using the parameters mapreduce.{map|reduce}.memory.mb. The system will reject any job that is submitted requesting for more memory than these limits. Typically, the values for these options should be set to satisfy the following constraint:
mapreduce.jobtracker.maxmapmemory.mb = mapreduce.cluster.mapmemory.mb * mapreduce.tasktracker.map.tasks.maximum
mapreduce.jobtracker.maxreducememory.mb = mapreduce.cluster.reducememory.mb * mapreduce.tasktracker.reduce.tasks.maximum
The value is specified in MB. If mapreduce.cluster.reducememory.mb is set to 2GB and there are 2 reduce slots configured in the slaves, the value formapreduce.jobtracker.maxreducememory.mb should be set to 4096.
mapreduce.tasktracker.reserved.physicalmemory.mb: This option defines the amount of physical memory that is marked for system and daemon processes. Using this, the amount of physical memory available for MapReduce tasks is calculated using the following equation:
Total physical memory for all MapReduce tasks = Total physical memory available on the system - mapreduce.tasktracker.reserved.physicalmemory.mb
The value is specified in MB. To set this value to 2GB, specify the value as 2048.
mapreduce.tasktracker.taskmemorymanager.monitoringinterval: This option defines the time the TaskTracker waits between two cycles of memory monitoring. The value is specified in milliseconds.
Note: The virtual memory monitoring function is only enabled if the variables mapreduce.cluster.{map|reduce}memory.mb andmapreduce.jobtracker.max{map|reduce}memory.mb are set to values greater than zero. Likewise, the physical memory monitoring function is only enabled if the variable mapreduce.tasktracker.reserved.physicalmemory.mb is set to a value greater than zero.
轉自http://blog.csdn.net/amaowolf/article/details/7188504
總結
以上是生活随笔為你收集整理的hadoop job 数量_Hadoop job任务分配的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Java里的 for (;;) 与 wh
- 下一篇: java开发中spring常用的工具类