日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 运维知识 > 数据库 >内容正文

数据库

千万数据去重_mysql去重,3亿多数据量

發(fā)布時(shí)間:2025/3/19 数据库 44 豆豆
生活随笔 收集整理的這篇文章主要介紹了 千万数据去重_mysql去重,3亿多数据量 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

差不多3億6千萬數(shù)據(jù),需要去重。因?yàn)閿?shù)據(jù)量太大,所以:

將數(shù)據(jù)load data infile到大表里,不進(jìn)行任何去重操作,沒有任何約束。然后將數(shù)據(jù)分成幾十個(gè)小表,用這幾十個(gè)小表去對(duì)比大表去重。得到去重后的小表。去重以后的小表,根據(jù)字段進(jìn)行hash算出后兩位數(shù)字,重新建好新表,將去重后小表的數(shù)據(jù),插入到帶有hash數(shù)字新表中。

存儲(chǔ)過程如下(去重):

DELIMITER //

/*tblname 動(dòng)態(tài)控制表名*/

CREATE PROCEDURE create_imsi(IN tblname varchar(200))

begin

declare age int default 1;

declare done int(1) default 0;

declare v_imsi varchar(200);

/*定義游標(biāo)*/

declare cur_l cursor for select imsi from sqlstr;

/*定義異常*/

DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' set done=1;

drop view if exists sqlstr;

/*定義視圖*/

set @tbl = CONCAT("create view sqlstr as select a.imsi from tbl_new a,(select imsi from phone_",tblname," group by imsi having count(imsi) > 1) b where a.imsi = b.imsi group by imsi");

/*執(zhí)行視圖語(yǔ)句*/

PREPARE stmt FROM @tbl;

EXECUTE stmt;

DEALLOCATE PREPARE stmt;

OPEN cur_l;

FETCH cur_l INTO v_imsi;

while (done <> 1)

do

/*對(duì)比大表數(shù)據(jù),刪除小表中的重復(fù)數(shù)據(jù)*/

set @del = CONCAT("delete from phone_",tblname," where imsi=",v_imsi);

PREPARE stmt1 FROM @del;

EXECUTE stmt1;

DEALLOCATE PREPARE stmt1;

FETCH cur_l INTO v_imsi;

end while;

close cur_l;

end//

DELIMITER ;

2、根據(jù)hash算法插入新表:

DELIMITER //

CREATE PROCEDURE insert_imsi(IN tblname varchar(20))

begin

declare age int default 1;

declare done int(1) default 0;

declare done1 int(1) default 0;

declare v_imsi varchar(200);

declare v_e varchar(2000);

declare v_number varchar(3000);

declare v_ctype varchar(2000);

declare cur_l cursor for select split from sqlstr;

DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' set done=1;

DECLARE CONTINUE HANDLER FOR 1146 set done1=3;

DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' set done1=1;

DECLARE CONTINUE HANDLER FOR SQLSTATE '42000' set done1=2;

DECLARE CONTINUE HANDLER FOR SQLSTATE 'HY000' set done1=3;

drop view if exists sqlstx;

drop view if exists sqlstr;

set @sqlstx = CONCAT("create view sqlstr as SELECT imsi,number,ctype,mod(conv(right(md5(imsi),2),16,10),100) split from imsi_phone_",tblname);

PREPARE stmt1 FROM @sqlstx;

EXECUTE stmt1;

DEALLOCATE PREPARE stmt1;

OPEN cur_l;

WHILE done <> 1

DO

FETCH cur_l INTO v_e;

set @ins = concat("insert into imsi_",v_e,"(imsi,number,ctype) select imsi,number,ctype from sqlstr where split = '",v_e,"'");

PREPARE stmt3 FROM @ins;

EXECUTE stmt3;

END WHILE;

close cur_l;

end//

DELIMITER ;

報(bào)錯(cuò):1、ERROR 1243 (HY000) at line 1: Unknown prepared statement handler (stmt3) given to EXECUTE

2、ERROR 1054 (42S22) at line 1: Unknown column '000cdc41b2a02518' in 'where clause'

由于set @dat = concat("insert into imsi_",v_e,"(imsi,number,ctype) select imsi,number,ctype from imsi_phone_",tblname," where imsi=‘’",v_imsi,“‘’”);沒有在(=)那里加單引號(hào),因?yàn)樽侄卫镉凶帜浮?/p>

參數(shù)優(yōu)化:

由于建表使用innodb引擎,所以此優(yōu)化是針對(duì)innodb引擎的:

1、innodb_flush_log_at_trx_commit參數(shù)設(shè)置為0,減少刷新。

2、set?sql_log_bin=0  暫時(shí)不產(chǎn)生二進(jìn)制日志

3、sync_binlog  設(shè)置為0,減少刷新

4、innodb_buffer_pool_size    盡可能設(shè)置最大

5、set foreign_key_checks=0  去除外鍵檢查

6、減少不必要的索引,有重復(fù)數(shù)據(jù)的話,主鍵是必須要的

7、innodb_change_buffer_max_size    上限為50,這里我設(shè)置為40,因?yàn)閘oad是插入數(shù)據(jù),所以設(shè)置插入緩沖

8、binlog_cache_size  如果必須要開啟二進(jìn)制日志,設(shè)置此參數(shù)盡可能大

9、innodb_flush_method    刷新模式,設(shè)置為O_DIRECT

10、innodb_io_capacity    刷新臟頁(yè),根據(jù)你的硬盤設(shè)置

11、innodb_log_buffer_size  盡可能設(shè)置最大

12、unique_checks  設(shè)置為不檢查:set?unique_checks=0;

13、alter table tablename disable keys;設(shè)置表忽略索引,如果有。

總結(jié)

以上是生活随笔為你收集整理的千万数据去重_mysql去重,3亿多数据量的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。