日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ElasticSearch(笔记)

發布時間:2023/12/3 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 ElasticSearch(笔记) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

簡介

  • 本教程基于ElasticSearch7.6.1, 注意ES7的語法與ES6的API調用差別很大, 教程發布時最新版本為ES7.6.2(20200401更新);
  • ES是用于全文搜索的工具:
    • SQL: 使用like %關鍵詞%來進行模糊搜索在大數據情況下是非常慢的, 即便設置索引提升也有限;
    • ElasticSearch: 搜索引擎(baidu, github, taobao)
    • 一些ES涉及的概念:
      • 分詞器 ik
      • Restful操作ES
      • CRUD
      • SpringBoot集成ES

    Lucene庫創始人 Doug Cutting

  • Lucene: java寫成的為各種中小型應用軟件加入全文檢索功能;
  • Nutch: 一個建立在Lucene核心之上的網頁搜索應用程序, Nutch的應用比Lucene要更加廣泛
  • 大數據解決存儲與計算(MapReduce)兩個問題:
    • 2004年Doug Cutting基于GFS系統開發了分布式文件存儲系統;
    • 2005年Doug Cutting基于MapReduce在Nutch搜索引擎實現了這種算法;
    • 加入Yahoo后, Doug Cutting將MapReduce和NDFS結合創建了Hadoop, 成為了Hadoop之父;
    • Doug Cutting將BigTable集成到Hadoop中
  • 回到主題:
    • Lucene是一套信息檢索工具包, jar包, 不包含搜索引擎系統;
    • Lucene包含索引結構, 讀寫索引的工具, 排序, 搜索規則, 工具類;
    • Lucene和ES的關系:
      • ES是基于Lucene做了一些封裝和增強, 上手是比較簡單的, 比Redis要簡單

    Elastic概述

  • 分布式的全文搜索引擎, 高擴展性;
  • 接近實時更新的查詢搜索;
  • ES是基于Restful的(即用get, post, delete, put來訪問);
  • ES進行復雜的數據分析, ELK技術(elastic+logstash+kibana)
  • Elastic vs solr

  • 當使用索引時, solr會發生io阻塞, 查詢性較差, elastic則在索引情況下的優勢明顯;
  • elastic的效率在傳統項目下一般有50倍的提升;
  • elastic解壓即可用, solr需要配置
  • solr用zookeeper進行分布式管理, elastic自帶分布式
  • solr支持更多格式的數據, json, xml, csv, elastic只支持json
  • solr比elastic的功能更強大
  • solr查詢快, 但是更新索引時慢(如插入和刪除慢), elastic查詢慢, 但是實時性查詢快, 用于facebook新浪等搜索
  • solr是傳統搜索應用的解決方案, elastic適用于新興的實時搜索應用
  • solr比較成熟, elastic目前更新換代快;
  • 環境準備(版本對應)

    • 本筆記參考狂神說,版本為7.6.X
    • Lucene是一套信息檢索工具包(jar包),不含搜索引擎系統
    • ElasticSearch是基于Lucene做了一些封裝和增強

    入門操作

    • JDK1.8以上,客戶端,界面工具
    • 版本對應。

    下載

    官網下載

    windows下解壓就可以使用

    目錄:

    bin:啟動文件 config:配置文件log4j2 日志文件jvm.options 虛擬機文件elasticsearch.yml 配置文件 比如默認9200端口 lib:相關jar包modules:功能模塊 plugins:插件:比如ik插件

    啟動,然后localhost:9200訪問

    可視化界面head

    es head插件,github上面下載

    https://github.com/mobz/elasticsearch-head

    npm installnpm run start #啟動插件:localhost:9100

    解決跨域問題

    修改elasticsearch.yml文件

    #解決跨域問題http.cors.enabled: truehttp.cors.allow-origin: "*"

    kibana日志分析和命令輸入

    • ELK:日志分析架構棧
    • 注意:下載版本與es一致;可以在配置文件中漢化
    • 默認端口 localhost:5601

    漢化

    配置文件中XXX.yml

    ES核心概念

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-SRzob1Aa-1610955877349)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210117195426957.png)]

    • es是面向文檔的,一切都是JSON

    • 對比

      • 關系型數據庫Elasticsearch
        數據庫database索引 indices(數據庫)
        表tablestypes (以后會被棄用)
        行rowsdocuments (文檔)
        字段columnsfields
    • 物理設計

      • 在后臺把每個索引劃分為多個分片,每片可以再集群中的不同服務器間遷移;
    • 邏輯設計

      • 文檔:索引和搜索數據的最小單位是文檔;
        • 自我包含:key:value
        • 層次型:一個文檔中包含文檔(json對象)
      • 類型:文檔的邏輯容器
      • 索引:數據庫
    • 倒排索引

      • es使用倒排索引的結構,采用Lucene倒排索引作為底層。用于快速全文檢索。

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-jfXa0y38-1610955877351)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210117204515912.png)]

    IK分詞器插件

    • 什么是IK分詞器:

      • 把一句話分詞

      • 如果使用中文:推薦IK分詞器

      • 兩個分詞算法:ik_smart(最少切分),ik_max_word(最細粒度劃分)

    4.1 下載安裝

    下載地址:https://github.com/medcl/elasticsearch-analysis-ik/releases

    然后解壓,放到elasticsearch的plugins中,建立“ik”文件夾,然后放入;

    重啟觀察es:發現加載ik插件了

    ik_smart

    輸入:

    GET _analyze {"analyzer": "ik_smart","text": "我是社會主義接班人" }

    輸出:

    {"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "社會主義","start_offset" : 2,"end_offset" : 6,"type" : "CN_WORD","position" : 2},{"token" : "接班人","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 3}] }

    ik_max_word

    輸入:

    GET _analyze {"analyzer": "ik_max_word","text": "我是社會主義接班人" }

    輸入:

    {"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "社會主義","start_offset" : 2,"end_offset" : 6,"type" : "CN_WORD","position" : 2},{"token" : "社會","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 3},{"token" : "主義","start_offset" : 4,"end_offset" : 6,"type" : "CN_WORD","position" : 4},{"token" : "接班人","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 5},{"token" : "接班","start_offset" : 6,"end_offset" : 8,"type" : "CN_WORD","position" : 6},{"token" : "人","start_offset" : 8,"end_offset" : 9,"type" : "CN_CHAR","position" : 7}] }

    用戶配置 字典

    當一些特殊詞(比如姓名)不能被識別切分時候,用戶可以自定義字典:

    重啟es和kibana測試

    Rest風格

    5.1 簡介

    RESTful是一種架構的規范與約束、原則,符合這種規范的架構就是RESTful架構。

    操作

    methodurl地址描述
    PUTlocalhost:9100/索引名稱/類型名稱/文檔id創建文檔(指定id)
    POSTlocalhost:9100/索引名稱/類型名稱創建文檔(隨機id)
    POSTlocalhost:9100/索引名稱/文檔類型/文檔id/_update修改文檔
    DELETElocalhost:9100/索引名稱/文檔類型/文檔id刪除文檔
    GETlocalhost:9100/索引名稱/文檔類型/文檔id查詢文檔通過文檔id
    POSTlocalhost:9100/索引名稱/文檔類型/_search查詢所有文檔

    5.2 測試

    • 1、創建一個索引PUT /索引名/類型名/id
    • 默認是_doc

    數據類型

  • 基本數據類型
    • 字符串 text, keyword
    • 數據類型 long, integer,short,byte,double,float,half_float,scaled_float
    • 日期 date
    • 布爾 boolean
    • 二進制 binary
  • 制定數據類型
  • 創建規則

    PUT /test2 {"mappings": {"properties": {"name": {"type": "text"},"age": {"type": "long"},"birthday": {"type": "date"}}} }

    輸出:

    {"acknowledged" : true,"shards_acknowledged" : true,"index" : "test2" }

    如果不指定具體類型,es會默認配置類型

    查看索引

    GET test2

    • 查看es信息

      get _cat/

    修改

    1. 之前的辦法:直接put2. 現在的辦法: POST /test1/_doc/1/_update{ "doc": {"name": "龐世宗"}}

    刪除索引

    DELETE test1

    關于文檔的基本操作(重點)

    基本操作

    添加數據

    PUT /psz/user/1 {"name": "psz","age": 22,"desc": "偶像派程序員","tags": ["暖","帥"] }

    獲取數據

    GEt psz/user/1 ===============輸出=========== {"_index" : "psz","_type" : "user","_id" : "1","_version" : 1,"_seq_no" : 0,"_primary_term" : 1,"found" : true,"_source" : {"name" : "psz","age" : 22,"desc" : "偶像派程序員","tags" : ["暖","帥"]} }

    更新數據PUT

    更新數據,推薦POST _update

    • 不推薦
    POST psz/user/1 {"doc":{"name": "龐龐胖" #后面信息會沒有} }
    • 推薦!
    POST psz/user/1/_update {"doc":{"name": "龐龐胖" #后面信息存在} }

    簡單搜索 GET

    GET psz/user/1

    簡答的條件查詢:根據默認映射規則產生基本的查詢

    GET psz/user/_search?q=name:龐世宗

    復雜查詢

    查詢,參數使用JSON體

    GET psz/user/_search {"query": {"match": {"name": "龐世宗" //根據name匹配} },"_source": ["name","age"], //結果的過濾,只顯示name和age"sort": [{"age": {"order": "desc" //根據年齡降序}}],"from": 0, //分頁:起始值,從0還是"size": 1 //返回多少條數據 }
    • 之后只用java操作es時候,所有的對象和方法就是這里面的key
    • 分頁前端 /search/{current}/{pagesize}

    布爾值查詢

    must(對應mysql中的and) ,所有條件都要符合

    GET psz/user/_search {"query": {"bool": {"must": [ //相當于and{"match": {"name": "龐世宗"}},{"match": {"age": 22}}]}} }

    shoule(對應mysql中的or)

    GET psz/user/_search {"query": {"bool": {"should": [ //should相當于or{"match": {"name": "龐世宗"}},{"match": {"age": 22}}]}} }

    must_not (對應mysql中的not)

    過濾器

    GET psz/user/_search {"query": {"bool": {"should": [{"match": {"name": "龐世宗"}}],"filter": [{"range": {"age": {"gt": 20 //過濾年齡大于20的}}}]}} }

    多條件查詢

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-1EZhNdoZ-1610955877352)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210117233812605.png)]

    精確查詢

    • trem查詢是直接通過倒排索引指定的詞條進行精確的查找的。

    關于分詞:

    trem,直接查詢精確地

    match,會使用分詞器解析

    關于類型:

    text: 分詞器會解析

    keywords: 不會被拆分

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-pqsrOf4H-1610955877357)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210117234310173.png)]

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-WBP1qabF-1610955877361)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210117234442418.png)]

    高亮查詢

    GET psz/user/_search {"query": {"match": {"name": "龐世宗"}},"_source": ["name","age"],"sort": [{"age": {"order": "desc"}}],"highlight": //高亮{"pre_tags": "<P>", //自定義高亮"post_tags": "</P>", "fields": {"name":{} //自定義高亮區域} } }

    集成Springboot

    官方文檔:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-EtZuYbHs-1610955877362)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210117234918617.png)]

    創建一個模塊的辦法(新)

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-96Z6UGhi-1610955877363)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210117235819775.png)]

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-bDRLboz4-1610955877364)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210118000624531.png)]

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-n5p04vql-1610955877365)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210118001126961.png)]

    1、找到原生的依賴

    <dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId><version>7.6.1</version> </dependency><properties><java.version>1.8</java.version><elasticsearch.version>7.6.1</elasticsearch.version></properties>

    2、找對象

    Initialization

    A RestHighLevelClient instance needs a REST low-level client builder to be built as follows:

    package com.kuang.config;import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient; import org.springframework.boot.context.properties.ConfigurationProperties; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration;@Configuration public class ElasticSearchClientConfig {@Beanpublic RestHighLevelClient restHighLevelClient(){RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),new HttpHost("localhost", 9201, "http")));return client;} }

    The high-level client will internally create the low-level client used to perform requests based on the provided builder. That low-level client maintains a pool of connections and starts some threads so you should close the high-level client when you are well and truly done with it and it will in turn close the internal low-level client to free those resources. This can be done through the close:

    client.close();

    In the rest of this documentation about the Java High Level Client, the RestHighLevelClient instance will be referenced as client.

    3、分析類中的方法

    一定要版本一致!默認es是6.8.1,要改成與本地一致的。

    <properties><java.version>1.8</java.version><elasticsearch.version>7.6.1</elasticsearch.version></properties>

    Java配置類

    @Configuration //xml public class EsConfig {@Beanpublic RestHighLevelClient restHighLevelClient(){RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"))); //媽的被這個端口搞了return client;} }

    索引API操作

    1、創建索引

    @SpringBootTest class EsApplicationTests {@Autowired@Qualifier("restHighLevelClient")private RestHighLevelClient restHighLevelClient;//創建索引的創建 Request@Testvoid testCreateIndex() throws IOException {//1.創建索引請求CreateIndexRequest request = new CreateIndexRequest("索引名");//2.執行創建請求 indices 請求后獲得響應CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);System.out.println(createIndexResponse);}}

    2、獲取索引

    @Testvoid testExistIndex() throws IOException {GetIndexRequest request = new GetIndexRequest("索引名");boolean exist =restHighLevelClient.indices().exists(request,RequestOptions.DEFAULT);System.out.println(exist);}

    3、刪除索引

    @Testvoid deleteIndex() throws IOException{DeleteIndexRequest request = new DeleteIndexRequest("索引名");AcknowledgedResponse delete = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);System.out.println(delete.isAcknowledged());}

    文檔API操作

    package com.kuang.pojo;import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component;@Data @AllArgsConstructor @NoArgsConstructor @Component public class User {private String name;private int age;}

    1、測試添加文檔

    導入

    <dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.16</version> </dependency> //測試添加文檔@Testvoid testAddDocument() throws IOException {//創建對象User user = new User("psz", 22);IndexRequest request = new IndexRequest("ppp");//規則 PUT /ppp/_doc/1request.id("1");request.timeout(timeValueSeconds(1));//數據放入請求IndexRequest source = request.source(JSON.toJSONString(user), XContentType.JSON);//客戶端發送請求,獲取響應結果IndexResponse indexResponse = restHighLevelClient.index(request, RequestOptions.DEFAULT);System.out.println(indexResponse.toString());System.out.println(indexResponse.status());}

    2、獲取文檔

    //獲取文檔,判斷是否存在 GET /index/doc/1@Testvoid testIsExists() throws IOException {GetRequest getRequest = new GetRequest("ppp", "1");//過濾,不放回_source上下文getRequest.fetchSourceContext(new FetchSourceContext(false));getRequest.storedFields("_none_");boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);System.out.println(exists);}

    3、獲取文檔信息

    //獲取文檔信息@Testvoid getDocument() throws IOException {GetRequest getRequest = new GetRequest("ppp", "1");GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);System.out.println(getResponse.getSourceAsString());System.out.println(getResponse);} ==============輸出========================== {"age":22,"name":"psz"} {"_index":"ppp","_type":"_doc","_id":"1","_version":2,"_seq_no":1,"_primary_term":1,"found":true,"_source":{"age":22,"name":"psz"}}

    4、更新文檔信息

    //更新文檔信息@Testvoid updateDocument() throws IOException {UpdateRequest updateRequest = new UpdateRequest("ppp","1");updateRequest.timeout("1s");//json格式傳入對象User user=new User("新名字",21);updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);//請求,得到響應UpdateResponse updateResponse = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);System.out.println(updateResponse);}

    5、刪除文檔信息

    //刪除文檔信息 @Test void deleteDocument() throws IOException {DeleteRequest deleteRequest = new DeleteRequest("ppp","1");deleteRequest.timeout("1s");DeleteResponse deleteResponse = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);System.out.println(deleteResponse); }

    批量操作Bulk

    • 真實項目中,肯定用到大批量查詢
    • 不寫id會隨機生成id

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ppmPZo0L-1610955877367)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210118104900129.png)]

    @Testvoid testBulkRequest() throws IOException{BulkRequest bulkRequest = new BulkRequest();bulkRequest.timeout("10s");//數據量大的時候,秒數可以增加ArrayList<User> userList = new ArrayList<>();userList.add(new User("psz",11));userList.add(new User("psz2",12));userList.add(new User("psz3",13));userList.add(new User("psz4",14));userList.add(new User("psz5",15));for (int i = 0; i < userList.size(); i++) {bulkRequest.add(new IndexRequest("ppp").id(""+(i+1)).source(JSON.toJSONString(userList.get(i)),XContentType.JSON));}//請求+獲得響應BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);System.out.println(bulkResponse.hasFailures());//返回false:成功}

    搜索

    /*查詢:搜索請求:SearchRequest條件構造:SearchSourceBuilder*/@Testvoid testSearch() throws IOException {SearchRequest searchRequest = new SearchRequest("ppp");//構建搜索條件SearchSourceBuilder searchSourceBuilderBuilder = new SearchSourceBuilder();// 查詢條件QueryBuilders工具// :比如:精確查詢TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "psz");searchSourceBuilderBuilder.query(termQueryBuilder);//設置查詢時間searchSourceBuilderBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));//設置高亮//searchSourceBuilderBuilder.highlighter()searchRequest.source(searchSourceBuilderBuilder);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);System.out.println(JSON.toJSONString(searchResponse.getHits()));}

    項目搭建

    1、啟動ES,和head-master,用head-master建立索引


    不建立也沒事,添加數據的時候會自動創建

    2、導入SpringBoot需要的依賴

    注意:elasticsearch的版本要和自己本地的版本一致!所以還要在pom里面添加自定義版本

    <!--解析網頁需要的依賴Jsoup--> <dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.10.2</version> </dependency> <!--阿里的JSon轉換依賴--> <dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.73</version> </dependency> <!--ES啟動依賴--> <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <!--thymeleaf模板依賴--> <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId> </dependency> <!--lombok依賴--> <dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional> </dependency>

    3、項目用到的靜態資源(修改過的)

    • 鏈接:https://pan.baidu.com/s/1X1kwMHsDvML-0rBEJnUOdA
    • 提取碼:qjqy

    4、添加SpringBoot配置(application.yml)

    #端口改為9090 server:port: 9090# 關閉 thymeleaf 的緩存 spring:thymeleaf:cache: false

    5、項目的整體結構

    6、添加靜態資源到項目中

    7、SpringBoot中添加ES客戶端配置類

    ElasticSearchClientConfig.java

    package com.wu.config;@Configuration public class ElasticSearchClientConfig {@Beanpublic RestHighLevelClient restHighLevelClient() {RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")));return client;} }

    Jsoup爬取京東數據

    爬取數據

    1、進入京東官網搜索java

    2、按F12審查元素,找到書籍所在位置


    3、在utils包下建立HtmlParseUtil.java爬取測試

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-UiLd3GNL-1610955877368)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210118112732209.png)]

    //測試數據 public static void main(String[] args) throws IOException, InterruptedException {//獲取請求String url = "https://search.jd.com/Search?keyword=java";// 解析網頁 (Jsou返回的Document就是瀏覽器的Docuement對象)Document document = Jsoup.parse(new URL(url), 30000);//獲取id,所有在js里面使用的方法在這里都可以使用Element element = document.getElementById("J_goodsList");//獲取所有的li元素Elements elements = element.getElementsByTag("li");//用來計數int c = 0;//獲取元素中的內容 ,這里的el就是每一個li標簽for (Element el : elements) {c++;//這里有一點要注意,直接attr使用src是爬不出來的,因為京東使用了img懶加載String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");//獲取商品的價格,并且只獲取第一個text文本內容String price = el.getElementsByClass("p-price").eq(0).text();String title = el.getElementsByClass("p-name").eq(0).text();String shopName = el.getElementsByClass("p-shop").eq(0).text();System.out.println("========================================");System.out.println(img);System.out.println(price);System.out.println(title);System.out.println(shopName);}System.out.println(c); }

    測試結果

    獲取結果沒問題,下面就把它封裝成一個工具類

    4、建立一個pojo實體類

    實體類Content.java

    package com.wu.pojo;@Data @AllArgsConstructor @NoArgsConstructor public class Content {private String img;private String price;private String title;private String shopName;//可以自己擴展屬性 }

    工具類HtmlParseUtil.java

    package com.wu.utils;@Component public class HtmlParseUtil {public List<Content> parseJD(String keyword) throws IOException {List<Content> list = new ArrayList<>();String url = "https://search.jd.com/Search?keyword=" + keyword;Document document = Jsoup.parse(new URL(url), 30000);Element element = document.getElementById("J_goodsList");Elements elements = element.getElementsByTag("li");for (Element el : elements) {String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");String price = el.getElementsByClass("p-price").eq(0).text();String title = el.getElementsByClass("p-name").eq(0).text();String shopName = el.getElementsByClass("p-shopnum").eq(0).text();list.add(new Content(img, price, title, shopName));}return list;} }

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-q05kRYi4-1610955877369)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210118115802010.png)]

    5、業務層,這里就不寫接口了

    ContentService.java

    先寫一個方法讓爬取的數據添加到ES中

    package com.wu.service;//業務編寫 @Service public class ContentService {//將客戶端注入@Autowired@Qualifier("restHighLevelClient")private RestHighLevelClient client;//1、解析數據放到 es 中public boolean parseContent(String keyword) throws IOException {List<Content> contents = new HtmlParseUtil().parseJD(keyword);//把查詢的數據放入 es 中BulkRequest request = new BulkRequest();request.timeout("2m");for (int i = 0; i < contents.size(); i++) {request.add(new IndexRequest("jd_goods").source(JSON.toJSONString(contents.get(i)), XContentType.JSON));}BulkResponse bulk = client.bulk(request, RequestOptions.DEFAULT);return !bulk.hasFailures();} }

    6、在Controller包下建立

    ContentController.java

    package com.wu.controller;//請求編寫 @RestController public class ContentController {@Autowiredprivate ContentService contentService;@GetMapping("/parse/{keyword}")public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {return contentService.parseContent(keyword);} }

    7、啟動SpringBoot項目,訪問它爬取數據添加到ES中

    http://127.0.0.1:9090/parse/java


    實現搜索功能

    [外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-t3mspb23-1610955877370)(C:\Users\王東梁\AppData\Roaming\Typora\typora-user-images\image-20210118131856663.png)]

    1、在ContentService.java添加

    //2、獲取這些數據實現基本的搜索功能 public List<Map<String, Object>> searchPage(String keyword, int pageNo, int pageSize) throws IOException {if (pageNo <= 1) {pageNo = 1;}if (pageSize <= 1) {pageSize = 1;}//條件搜索SearchRequest searchRequest = new SearchRequest("jd_goods");SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();//分頁sourceBuilder.from(pageNo).size(pageSize);//精準匹配TermQueryBuilder termQuery = QueryBuilders.termQuery("title", keyword);sourceBuilder.query(termQuery);sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));//執行搜索SearchRequest source = searchRequest.source(sourceBuilder);SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);//解析結果List<Map<String, Object>> list = new ArrayList<>();for (SearchHit documentFields : searchResponse.getHits().getHits()) {list.add(documentFields.getSourceAsMap());}return list; }

    2、在ContentController添加搜索請求

    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}") public List<Map<String, Object>> search(@PathVariable("keyword") String keyword,@PathVariable("pageNo") int pageNo,@PathVariable("pageSize") int pageSize) throws IOException {List<Map<String, Object>> list = contentService.searchPage(keyword, pageNo, pageSize);return list; }

    3、訪問http://127.0.0.1:9090/search/java/1/10


    歐克,爬取和搜索都沒問題,下面要做的就是和前端交互了

    和前端交互

    1、前端接收數據

    index.html

    1、用vue接收數據

    <script>new Vue({el: '#app',data: {keyword: '', //搜索的關鍵字results: [] //搜索的結果},methods: {searchKey() {var keyword = this.keywordaxios.get('search/' + keyword + '/1/210').then(response => {this.results = response.data;//綁定數據!})}}}) </script>

    2、用vue給前端傳遞數據

    2、訪問 127.0.0.1:9090 并且搜索java


    歐克,完美

    實現關鍵字高亮

    1、改ContentService.java里面的搜索功能就行

    //3、獲取這些數據實現基本的搜索高亮功能 public List<Map<String, Object>> searchPagehighlighter(String keyword, int pageNo, int pageSize) throws IOException {if (pageNo <= 1) {pageNo = 1;}if (pageSize <= 1) {pageSize = 1;}//條件搜索SearchRequest searchRequest = new SearchRequest("jd_goods");SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();//分頁sourceBuilder.from(pageNo).size(pageSize);//精準匹配TermQueryBuilder termQuery = QueryBuilders.termQuery("title", keyword);//==================================== 高 亮 ==========================================HighlightBuilder highlightBuilder = new HighlightBuilder(); //獲取高亮構造器highlightBuilder.field("title"); //需要高亮的字段highlightBuilder.requireFieldMatch(false);//不需要多個字段高亮highlightBuilder.preTags("<span style='color:red'>"); //前綴highlightBuilder.postTags("</span>"); //后綴sourceBuilder.highlighter(highlightBuilder); //把高亮構造器放入sourceBuilder中sourceBuilder.query(termQuery);sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));//執行搜索SearchRequest source = searchRequest.source(sourceBuilder);SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);//解析結果List<Map<String, Object>> list = new ArrayList<>();for (SearchHit hit : searchResponse.getHits().getHits()) {Map<String, HighlightField> highlightFields = hit.getHighlightFields();//獲取高亮字段HighlightField title = highlightFields.get("title"); //得到我們需要高亮的字段Map<String, Object> sourceAsMap = hit.getSourceAsMap();//原來的返回的結果//解析高亮的字段if (title != null) {Text[] fragments = title.fragments();String new_title = "";for (Text text : fragments) {new_title += text;}sourceAsMap.put("title", new_title); //高亮字段替換掉原來的內容即可}list.add(sourceAsMap);}return list; }

    2、改變Controller里面的搜索請求

    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}") public List<Map<String, Object>> search(@PathVariable("keyword") String keyword,@PathVariable("pageNo") int pageNo,@PathVariable("pageSize") int pageSize) throws IOException {List<Map<String, Object>> list = contentService.searchPagehighlighter(keyword, pageNo, pageSize);return list; }

    3、發現問題

    需要高亮的字段前綴和后綴都有了,但是這不是我們想要的結果

    4、解決問題

    這里Vue給了我們很方便的解決辦法

    5、完美

    總結

    以上是生活随笔為你收集整理的ElasticSearch(笔记)的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。