使用第三方类库对html进行解析
生活随笔
收集整理的這篇文章主要介紹了
使用第三方类库对html进行解析
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
html解析最重要的就是看清楚節(jié)點(diǎn),看是用DIV取還是用class,搞清楚結(jié)構(gòu)之后,解析規(guī)范的網(wǎng)頁(yè)都不是什么問題。
如果網(wǎng)頁(yè)不規(guī)范,則要看具體情況而定了
把NSData轉(zhuǎn)成NSString類型的數(shù)據(jù)
NSString * str = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];把NSString類型的數(shù)據(jù)轉(zhuǎn)成DocumentRoot類型的文件(DocumentRoot是第三方類庫(kù)提供的類,把數(shù)據(jù)轉(zhuǎn)成這種類型才能用第三方類庫(kù)進(jìn)一步解析)
DocumentRoot * document = [Element parseHTML:str];取出所有的DIV:
NSArray * childSecond = [childEl selectElements:@"div"];
該塊DIV為 <div class="item"><div class="pic"><a href="http://www.weiphone.com/iPhone/news/2013-08-02/Come_the_Blackberry_company_to_send_BBM_for_iOS_beta_invites_560440.shtml"><img src="http://resource.weiphone.com/resource/h027/h73/img201308021306430.jpg" alt="" height="100" width="158" /></a></div><div class="head"><h3><a href="http://www.weiphone.com/iPhone/news/2013-08-02/Come_the_Blackberry_company_to_send_BBM_for_iOS_beta_invites_560440.shtml">快來了 黑莓公司發(fā)送BBM for iOS測(cè)試邀請(qǐng)</a></h3><div class="meta"><span class="timer" title="發(fā)表時(shí)間">2013/08/02 13:05</span><span class="line">|</span><a href="http://bbs.weiphone.com/u.php?uid=798198" class="author" title="作者"> 黃曉悶</a> <span class="line">|</span><a href="javascript:void(0);" class="link" title="文章來源">weiphone</a><div class="funs"><span class="view" title="瀏覽次數(shù)">2689</span><span class="line">|</span><a href="http://www.weiphone.com/iPhone/news/2013-08-02/Come_the_Blackberry_company_to_send_BBM_for_iOS_beta_invites_560440.shtml#comment" class="cmt" title="評(píng)論次數(shù)">5</a></div></div></div><div class="desc"><p>威鋒網(wǎng) 8 月 2 日消息,黑莓公司日前向 iOS 用戶發(fā)送了 BBM 的測(cè)試邀請(qǐng),暗示著該服務(wù)的正式登陸已經(jīng)進(jìn)入最后階段。</p></div></div>
把class為item的DIV的下級(jí)DIV全部取出來放入childSecond
NSArray * childSecond = [childEl selectElements:@"div"];取標(biāo)簽包圍的內(nèi)容 Element * child = [secondEl selectElement:@"a"];
取標(biāo)簽尖括號(hào)里的內(nèi)容 new.detailURL = [child.attributes objectForKey:@"href"];
以下是一個(gè)完整的解析方法,解析的網(wǎng)頁(yè)為http://www.weiphone.com/iPhone/news/index_0.shtml的class為item的DIV
這個(gè)方法是在下載完成調(diào)用的,傳遞一個(gè)NSData類型的參數(shù)進(jìn)去:
數(shù)據(jù)模型:
// // News.h // LookNewsProject // // Created by ibokan on 13-08-01. // Copyright (c) 2013年 laomaoshiba. All rights reserved. //#import <Foundation/Foundation.h>@interface News : NSObject//標(biāo)題,發(fā)布時(shí)間,詳情鏈接,圖片鏈接,瀏覽次數(shù),評(píng)價(jià),類別,作者,簡(jiǎn)介,來源 @property(copy,nonatomic)NSString * title, * publishTime, * detailURL, * imgURL, * viewTimes, * evaluateTimes, * category, * author, * intro, * origin; @end解析方法:
-(void) analyNews:(NSData *)data {//中文轉(zhuǎn)碼NSString * str = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];//NSLog(@"%@",str);//html解析DocumentRoot * document = [Element parseHTML:str];//以div分割NSArray * elements= [document selectElements:@"div"];//創(chuàng)建存儲(chǔ)數(shù)組NSMutableArray * newArr = [[NSMutableArray alloc]init];//循環(huán)解析for (Element* element in elements){if ([[element attribute:@"class"] isEqualToString:@"item"]){NSArray * childElement = [element childElements];//創(chuàng)建新聞實(shí)體News * new = [[News alloc]init];int i=0;for(Element* childEl in childElement){i++;NSArray * childSecond = [childEl selectElements:@"div"];for(Element * secondEl in childSecond){if([[secondEl attribute:@"class"] isEqualToString:@"pic"]){Element * child = [secondEl selectElement:@"a"];//獲取詳細(xì)信息//NSLog(@"詳細(xì)鏈接:%@",[child.attributes objectForKey:@"href"]);new.detailURL = [child.attributes objectForKey:@"href"];//獲取圖片鏈接//NSLog(@"圖片鏈接:%@",[[child selectElement:@"img"].attributes objectForKey:@"src"]);new.imgURL = [[child selectElement:@"img"].attributes objectForKey:@"src"];}else if([[secondEl attribute:@"class"] isEqualToString:@"head"]){//獲取新聞標(biāo)題//NSLog(@"標(biāo)題:%@",[[secondEl selectElement:@"a"] contentsSource]);new.title = [[secondEl selectElement:@"a"] contentsSource];}else if([[secondEl attribute:@"class"] isEqualToString:@"meta"]){//獲取作者//NSLog(@"test-------->%@",[[secondEl selectElement:@"div"] contentsSource]);//NSLog(@"作者:%@",[[secondEl selectElement:@"a"] contentsSource]);new.author = [[secondEl selectElement:@"a"] contentsSource];//獲取發(fā)表時(shí)間//NSLog(@"發(fā)表時(shí)間:%@",[[secondEl selectElement:@"span"] contentsSource]);new.publishTime = [[secondEl selectElement:@"span"] contentsSource];//獲取來源Element * originEl = [[secondEl selectElements:@"a"] objectAtIndex:1] ;//NSLog(@"來源:%@",[originEl contentsSource]);new.origin = [originEl contentsSource];}else if([[secondEl attribute:@"class"] isEqualToString:@"funs"]){//獲取瀏覽//NSLog(@"瀏覽次數(shù):%@",[[secondEl selectElement:@"span"] contentsSource]);new.viewTimes = [[secondEl selectElement:@"span"] contentsSource];//獲取評(píng)價(jià)次數(shù)//NSLog(@"評(píng)價(jià)次數(shù):%@",[[secondEl selectElement:@"a"] contentsSource]);new.evaluateTimes = [[secondEl selectElement:@"a"] contentsSource];}else if([[secondEl attribute:@"class"] isEqualToString:@"desc"]){//獲取簡(jiǎn)介//NSLog(@"簡(jiǎn)介:%@",[[secondEl selectElement:@"p"] contentsSource]);new.intro = [[secondEl selectElement:@"p"] contentsSource];}}}[newArr addObject:new];}}[str release];[newArr release];}總結(jié)
以上是生活随笔為你收集整理的使用第三方类库对html进行解析的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Navicat新建查询系统找不到指定路径
- 下一篇: textblock字体居中 wpf_M#