使用随机数以及扩容表进行join代码
生活随笔
收集整理的這篇文章主要介紹了
使用随机数以及扩容表进行join代码
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
/*** 使用隨機數和擴容表進行join*/JavaPairRDD<String, Row> expandedRDD = userid2InfoRDD.flatMapToPair(new PairFlatMapFunction<Tuple2<Long,Row>, String, Row>() {private static final long serialVersionUID = 1L;@Overridepublic Iterable<Tuple2<String, Row>> call(Tuple2<Long, Row> tuple)throws Exception {List<Tuple2<String, Row>> list = new ArrayList<Tuple2<String, Row>>();for(int i = 0; i < 10; i++) {list.add(new Tuple2<String, Row>(0 + "_" + tuple._1, tuple._2));}return list;}});JavaPairRDD<String, String> mappedRDD = userid2PartAggrInfoRDD.mapToPair(new PairFunction<Tuple2<Long,String>, String, String>() {private static final long serialVersionUID = 1L;@Overridepublic Tuple2<String, String> call(Tuple2<Long, String> tuple)throws Exception {Random random = new Random();int prefix = random.nextInt(10);return new Tuple2<String, String>(prefix + "_" + tuple._1, tuple._2); }});JavaPairRDD<String, Tuple2<String, Row>> joinedRDD = mappedRDD.join(expandedRDD);JavaPairRDD<String, String> finalRDD = joinedRDD.mapToPair(new PairFunction<Tuple2<String,Tuple2<String,Row>>, String, String>() {private static final long serialVersionUID = 1L;@Overridepublic Tuple2<String, String> call(Tuple2<String, Tuple2<String, Row>> tuple)throws Exception {String partAggrInfo = tuple._2._1;Row userInfoRow = tuple._2._2;String sessionid = StringUtils.getFieldFromConcatString(partAggrInfo, "\\|", Constants.FIELD_SESSION_ID);int age = userInfoRow.getInt(3);String professional = userInfoRow.getString(4);String city = userInfoRow.getString(5);String sex = userInfoRow.getString(6);String fullAggrInfo = partAggrInfo + "|"+ Constants.FIELD_AGE + "=" + age + "|"+ Constants.FIELD_PROFESSIONAL + "=" + professional + "|"+ Constants.FIELD_CITY + "=" + city + "|"+ Constants.FIELD_SEX + "=" + sex;return new Tuple2<String, String>(sessionid, fullAggrInfo);}});
?
轉載于:https://www.cnblogs.com/gentle-awen/p/10144893.html
總結
以上是生活随笔為你收集整理的使用随机数以及扩容表进行join代码的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 论文阅读笔记三十三:Feature Py
- 下一篇: mysql学习【第14篇】:pymysq