寿命相关数据集
原文:
Life Expectancy (WHO)
Statistical Analysis on factors influencing Life Expectancy
Although there have been lot of studies undertaken in the past on factors affecting life expectancy considering demographic variables, income composition and mortality rates. It was found that affect of immunization and human development index was not taken into account in the past. Also, some of the past research was done considering multiple linear regression based on data set of one year for all the countries. Hence, this gives motivation to resolve both the factors stated previously by formulating a regression model based on mixed effects model and multiple linear regression while considering data from a period of 2000 to 2015 for all the countries. Important immunization like Hepatitis B, Polio and Diphtheria will also be considered. In a nutshell, this study will focus on immunization factors, mortality factors, economic factors, social factors and other health related factors as well. Since the observations this dataset are based on different countries, it will be easier for a country to determine the predicting factor which is contributing to lower value of life expectancy. This will help in suggesting a country which area should be given importance in order to efficiently improve the life expectancy of its population.
The project relies on accuracy of data. The Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries The data-sets are made available to public for the purpose of health data analysis. The data-set related to life expectancy, health factors for 193 countries has been collected from the same WHO data repository website and its corresponding economic data was collected from United Nation website. Among all categories of health-related factors only those critical factors were chosen which are more representative. It has been observed that in the past 15 years , there has been a huge development in health sector resulting in improvement of human mortality rates especially in the developing nations in comparison to the past 30 years. Therefore, in this project we have considered data from year 2000-2015 for 193 countries for further analysis. The individual data files have been merged together into a single data-set. On initial visual inspection of the data showed some missing values. As the data-sets were from WHO, we found no evident errors. Missing data was handled in R software by using Missmap command. The result indicated that most of the missing data was for population, Hepatitis B and GDP. The missing data were from less known countries like Vanuatu, Tonga, Togo, Cabo Verde etc. Finding all data for these countries was difficult and hence, it was decided that we exclude these countries from the final model data-set. The final merged file(final dataset) consists of 22 Columns and 2938 rows which meant 20 predicting variables. All predicting variables was then divided into several broad categories:?Immunization related factors, Mortality factors, Economical factors and Social factors.
The Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries The datasets are made available to public for the purpose of health data analysis. The dataset related to life expectancy, health factors for 193 countries has been collected from the same WHO data repository website and its corresponding economic data was collected from United Nation website. Among all categories of health-related factors only those critical factors were chosen which are more representative. It has been observed that in the past 15 years , there has been a huge development in health sector resulting in improvement of human mortality rates especially in the developing nations in comparison to the past 30 years. Therefore, in this project we have considered data from year 2000-2015 for 193 countries for further analysis. The individual data files have been merged together into a single dataset. On initial visual inspection of the data showed some missing values. As the datasets were from WHO, we found no evident errors. Missing data was handled in R software by using Missmap command. The result indicated that most of the missing data was for population, Hepatitis B and GDP. The missing data were from less known countries like Vanuatu, Tonga, Togo,Cabo Verde etc. Finding all data for these countries was difficult and hence, it was decided that we exclude these countries from the final model dataset. The final merged file(final dataset) consists of 22 Columns and 2938 rows which meant 20 predicting variables. All predicting variables was then divided into several broad categories:?Immunization related factors, Mortality factors, Economical factors and Social factors.
譯:
預期壽命(世衛組織)
影響預期壽命因素的統計分析
盡管考慮到人口變量、收入構成和死亡率,過去對影響預期壽命的因素進行了大量研究。結果發現,過去沒有考慮免疫和人類發展指數的影響。此外,過去的一些研究考慮了基于所有國家一年數據集的多元線性回歸。因此,在考慮所有國家2000年至2015年期間的數據時,通過基于混合效應模型和多元線性回歸建立回歸模型,這為解決上述兩個因素提供了動力。重要的免疫接種,如乙型肝炎、脊髓灰質炎和白喉也將被考慮。簡而言之,本研究將關注免疫因素、死亡率因素、經濟因素、社會因素以及其他與健康相關的因素。由于該數據集的觀測結果基于不同的國家,因此一個國家更容易確定導致預期壽命降低的預測因素。這將有助于建議一個國家應該重視哪個地區,以便有效地提高其人口的預期壽命。
該項目依賴于數據的準確性。世界衛生組織(WHO)下屬的全球衛生觀測站(GHO)數據存儲庫跟蹤所有國家的健康狀況以及許多其他相關因素。這些數據集可供公眾使用,以進行衛生數據分析。193個國家與預期壽命、健康因素有關的數據集是從同一世衛組織數據庫網站收集的,其相應的經濟數據是從聯合國網站收集的。在所有類型的健康相關因素中,只有那些更具代表性的關鍵因素被選中。據觀察,與過去30年相比,在過去15年中,衛生部門有了巨大的發展,導致了人類死亡率的提高,尤其是在發展中國家。因此,在本項目中,我們考慮了193個國家2000-2015年的數據,以供進一步分析。各個數據文件已合并到一個數據集中。在最初的目視檢查中,數據顯示了一些缺失值。由于數據集來自世衛組織,我們沒有發現明顯的錯誤。在R軟件中使用Missmap命令處理丟失的數據。結果表明,大部分缺失數據是關于人口、乙肝和GDP的。缺少的數據來自瓦努阿圖、湯加、多哥、佛得角等鮮為人知的國家。很難找到這些國家的所有數據,因此,我們決定將這些國家排除在最終模型數據集中。最終合并的文件(最終數據集)由22列2938行組成,這意味著20個預測變量。然后將所有預測變量分為幾個大類:?免疫相關因素、死亡因素、經濟因素和社會因素。
世界衛生組織(WHO)下屬的全球衛生觀測站(GHO)數據存儲庫跟蹤所有國家的健康狀況以及許多其他相關因素。這些數據集可供公眾使用,以進行衛生數據分析。193個國家與預期壽命、健康因素有關的數據集是從同一世衛組織數據儲存庫網站收集的,其相應的經濟數據是從聯合國網站收集的。在所有類型的健康相關因素中,只有那些更具代表性的關鍵因素被選中。據觀察,與過去30年相比,在過去15年中,衛生部門有了巨大的發展,導致了人類死亡率的提高,尤其是在發展中國家。因此,在本項目中,我們考慮了193個國家2000-2015年的數據,以供進一步分析。各個數據文件已合并到一個數據集中。在最初的目視檢查中,數據顯示了一些缺失值。由于數據集來自世衛組織,我們沒有發現明顯的錯誤。在R軟件中使用Missmap命令處理丟失的數據。結果表明,大部分缺失數據是關于人口、乙肝和GDP的。缺少的數據來自瓦努阿圖、湯加、多哥、佛得角等鮮為人知的國家。很難找到這些國家的所有數據,因此,我們決定將這些國家從最終模型數據集中排除。最終合并的文件(最終數據集)由22列2938行組成,這意味著20個預測變量。然后將所有預測變量分為幾個大類:?免疫相關因素、死亡因素、經濟因素和社會因素。
大家可以到官網地址下載數據集,我自己也在百度網盤分享了一份。可關注本人公眾號,回復“202203”獲取下載鏈接。
?
總結
- 上一篇: Eclipse的MAT的支配树
- 下一篇: JVM基础知识---对象的创建过程