生活随笔
收集整理的這篇文章主要介紹了
Java初始化省市区三级数据
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
使用Jsoup爬蟲工具獲取全國地區數據(省市縣鎮村)
最近新做一個項目,要在數據庫初始化省市區三級數據,所以在網上找了個爬蟲工具,從國家統計局區劃代碼網站爬取了相關數據。具體原理不解釋了,只要能實現功能就OK。
- 首先需要導入Jsoup相關依賴,數據庫和spring的依賴就不用我再說了吧!!
<dependency><groupId>org.jsoup
</groupId><artifactId>jsoup
</artifactId><version>1.14.2
</version>
</dependency>
@Data
@AllArgsConstructor
@NoArgsConstructor
@TableName(value
= "area")
public class Area implements Serializable {@TableId(value
= "id")private String id
;@TableField(value
= "parentId")private String parentId
;@TableField(value
= "areaName")private String areaName
;@TableField(value
= "level")private Integer level
;
}
- 接著編寫代碼,從http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2020/這個網站爬取數據并存入數據庫,這個網站包含了全國省市縣鎮鄉所有數據,但是根據業務需要,本人業務只需要省市縣數據即可,各位可以根據自己的業務需求,修改相應代碼。
package com.yckj.appauth.service.impl;import com.baomidou.mybatisplus.extension.service.IService;
import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl;
import com.yckj.appauth.mapper.InitAreaMapper;
import com.yckj.appauth.service.InitAreaService;
import com.yckj.common.entity.Area;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
@Service
public class InitAreaServiceImpl extends ServiceImpl<InitAreaMapper, Area> implements InitAreaService {@Autowiredprivate InitAreaMapper initAreaMapper
;private static Map<Integer, String> cssMap
= new HashMap<>();static {cssMap
.put(1, "provincetr");cssMap
.put(2, "citytr");cssMap
.put(3, "countytr");}public void initArea() {int level
= 1;Document connect
= connect("http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2020/");Elements rowProvince
= connect
.select("tr." + cssMap
.get(level
));System.out
.println("區劃代碼****上級代碼****區域名稱****區域等級");for (Element provinceElement
: rowProvince
) {Elements select
= provinceElement
.select("a");for (Element province
: select
) {String strprovince
= province
.toString();String areaName
= strprovince
.substring(strprovince
.indexOf(".html\">") + 7, strprovince
.indexOf("<br></a>"));String areaCode
= "";switch (areaName
) {case "北京市":areaCode
= "110000";break;case "天津市":areaCode
= "120000";break;case "河北省":areaCode
= "130000";break;case "山西省":areaCode
= "140000";break;case "內蒙古自治區":areaCode
= "150000";break;case "遼寧省":areaCode
= "210000";break;case "吉林省":areaCode
= "220000";break;case "黑龍江省":areaCode
= "230000";break;case "上海市":areaCode
= "310000";break;case "江蘇省":areaCode
= "320000";break;case "浙江省":areaCode
= "330000";break;case "安徽省":areaCode
= "340000";break;case "福建省":areaCode
= "350000";break;case "江西省":areaCode
= "360000";break;case "山東省":areaCode
= "370000";break;case "河南省":areaCode
= "410000";break;case "湖北省":areaCode
= "420000";break;case "湖南省":areaCode
= "430000";break;case "廣東省":areaCode
= "440000";break;case "廣西壯族自治區":areaCode
= "450000";break;case "海南省":areaCode
= "460000";break;case "重慶市":areaCode
= "500000";break;case "四川省":areaCode
= "510000";break;case "貴州省":areaCode
= "520000";break;case "云南省":areaCode
= "530000";break;case "西藏自治區":areaCode
= "540000";break;case "陜西省":areaCode
= "610000";break;case "甘肅省":areaCode
= "620000";break;case "青海省":areaCode
= "630000";break;case "寧夏回族自治區":areaCode
= "640000";break;case "新疆維吾爾自治區":areaCode
= "650000";break;}Area area
= new Area();area
.setId(areaCode
);area
.setParentId("root");area
.setAreaName(areaName
);area
.setLevel(1);initAreaMapper
.insert(area
);System.out
.println(areaCode
+ "****root****" + areaName
+ "****" + 1);parseNextLevel(areaCode
, province
, level
+ 1);}}System.out
.println("執行完畢");}private void parseNextLevel(String parentId
, Element parentElement
, int level
) {try {Thread.sleep(500);} catch (InterruptedException e
) {e
.printStackTrace();}Document doc
= connect(parentElement
.attr("abs:href"));if (doc
!= null) {Elements newsHeadlines
= doc
.select("tr." + cssMap
.get(level
));for (Element element
: newsHeadlines
) {printInfo(parentId
, element
, level
+ 1);String code
= element
.select("td").first().text();Elements select
= element
.select("a");if (select
.size() != 0) {parseNextLevel(code
.substring(0, 6), select
.last(), level
+ 1);}}}}private void printInfo(String parentId
, Element element
, int level
) {String code
= element
.select("td").first().text();Area area
= new Area();area
.setId(code
.substring(0, 6));area
.setParentId(parentId
);area
.setAreaName(element
.select("td").last().text());area
.setLevel(level
- 1);initAreaMapper
.insert(area
);System.out
.println(area
.getId() + "****" + area
.getParentId() + "****" + area
.getAreaName() + "****" + (level
- 1));}private static Document connect(String url
) {if (url
== null || url
.isEmpty()) {throw new IllegalArgumentException("The input url('" + url
+ "') is invalid!");}try {return Jsoup.connect(url
).timeout(100 * 1000).get();} catch (IOException e
) {e
.printStackTrace();return null;}}
}
感謝:本人是根據前輩的代碼改寫形成的,下面附上前輩博客鏈接:Jsoup獲取全國地區數據(省市縣鎮村)。
總結
以上是生活随笔為你收集整理的Java初始化省市区三级数据的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。