日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

maven环境下使用java、scala混合开发spark应用

發(fā)布時間:2024/1/17 编程问答 20 豆豆
生活随笔 收集整理的這篇文章主要介紹了 maven环境下使用java、scala混合开发spark应用 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

熟悉java的開發(fā)者在開發(fā)spark應用時,常常會遇到spark對java的接口文檔不完善或者不提供對應的java接口的問題。這個時候,如果在java項目中能直接使用scala來開發(fā)spark應用,同時使用java來處理項目中的其它需求,將在一定程度上降低開發(fā)spark項目的難度。下面就來探索一下java、scala、spark、maven這一套開發(fā)環(huán)境要怎樣來搭建。

?

?

1、下載scala sdk

?

http://www.scala-lang.org/download/?直接到這里下載sdk,目前最新的穩(wěn)定版為2.11.7,下載后解壓就行

(后面在intellijidea中創(chuàng)建.scala后綴源代碼時,ide會智能感知并提示你設置scala sdk,按提示指定sdk目錄為解壓目錄即可)

?也可以手動配置scala SDK:ideal =>File =>project struct.. =>library..=> +...

2、下載scala forintellij idea的插件

?

如上圖,直接在plugins里搜索Scala,然后安裝即可,如果不具備上網(wǎng)環(huán)境,或網(wǎng)速不給力。也可以直接到http://plugins.jetbrains.com/plugin/?idea&id=1347手動下載插件的zip包,手動下載時,要特別注意版本號,一定要跟本機的intellij idea的版本號匹配,否則下載后無法安裝。下載完成后,在上圖中,點擊“Install plugin from disk...”,選擇插件包的zip即可。

3、如何跟maven整合

使用maven對項目進行打包的話,需要在pom文件中配置scala-maven-plugin這個插件。同時,由于是spark開發(fā),jar包需要打包為可執(zhí)行java包,還需要在pom文件中配置maven-assembly-plugin和maven-shade-plugin插件并設置mainClass。經(jīng)過實驗摸索,下面貼出一個可用的pom文件,使用時只需要在包依賴上進行增減即可使用。

?
  • <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  • xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  • <modelVersion>4.0.0</modelVersion>

  • <groupId>my-project-groupid</groupId>

  • <artifactId>sparkTest</artifactId>

  • <packaging>jar</packaging>

  • <version>1.0-SNAPSHOT</version>

  • <name>sparkTest</name>

  • <url>http://maven.apache.org</url>

  • <properties>

  • <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

  • <hbase.version>0.98.3</hbase.version>

  • <!--<spark.version>1.3.1</spark.version>-->

  • <spark.version>1.6.0</spark.version>

  • <jdk.version>1.7</jdk.version>

  • <scala.version>2.10.5</scala.version>

  • <!--<scala.maven.version>2.11.1</scala.maven.version>-->

  • </properties>

  • <repositories>

  • <repository>

  • <id>repo1.maven.org</id>

  • <url>http://repo1.maven.org/maven2</url>

  • <releases>

  • <enabled>true</enabled>

  • </releases>

  • <snapshots>

  • <enabled>false</enabled>

  • </snapshots>

  • </repository>

  • ?
  • <repository>

  • <id>repository.jboss.org</id>

  • <url>http://repository.jboss.org/nexus/content/groups/public/

  • </url>

  • <snapshots>

  • <enabled>false</enabled>

  • </snapshots>

  • </repository>

  • ?
  • <repository>

  • <id>cloudhopper</id>

  • <name>Repository for Cloudhopper</name>

  • <url>http://maven.cloudhopper.com/repos/third-party/</url>

  • <releases>

  • <enabled>true</enabled>

  • </releases>

  • <snapshots>

  • <enabled>false</enabled>

  • </snapshots>

  • </repository>

  • ?
  • <repository>

  • <id>mvnr</id>

  • <name>Repository maven</name>

  • <url>http://mvnrepository.com/</url>

  • <releases>

  • <enabled>true</enabled>

  • </releases>

  • <snapshots>

  • <enabled>false</enabled>

  • </snapshots>

  • </repository>

  • ?
  • <repository>

  • <id>scala</id>

  • <name>Scala Tools</name>

  • <url>https://mvnrepository.com/</url>

  • <releases>

  • <enabled>true</enabled>

  • </releases>

  • <snapshots>

  • <enabled>false</enabled>

  • </snapshots>

  • </repository>

  • </repositories>

  • ?
  • ?
  • <pluginRepositories>

  • <pluginRepository>

  • <id>scala</id>

  • <name>Scala Tools</name>

  • <url>https://mvnrepository.com/</url>

  • <releases>

  • <enabled>true</enabled>

  • </releases>

  • <snapshots>

  • <enabled>false</enabled>

  • </snapshots>

  • </pluginRepository>

  • </pluginRepositories>

  • ?
  • <dependencies>

  • <dependency>

  • <groupId>org.scala-lang</groupId>

  • <artifactId>scala-library</artifactId>

  • <version>${scala.version}</version>

  • <scope>compile</scope>

  • </dependency>

  • <dependency>

  • <groupId>org.scala-lang</groupId>

  • <artifactId>scala-compiler</artifactId>

  • <version>${scala.version}</version>

  • <scope>compile</scope>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/javax.mail/javax.mail-api -->

  • <dependency>

  • <groupId>javax.mail</groupId>

  • <artifactId>javax.mail-api</artifactId>

  • <version>1.4.7</version>

  • </dependency>

  • <dependency>

  • <groupId>junit</groupId>

  • <artifactId>junit</artifactId>

  • <version>3.8.1</version>

  • <scope>test</scope>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->

  • <dependency>

  • <groupId>org.apache.spark</groupId>

  • <artifactId>spark-core_2.10</artifactId>

  • <version>${spark.version}</version>

  • </dependency>

  • ?
  • <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->

  • <dependency>

  • <groupId>org.apache.spark</groupId>

  • <artifactId>spark-sql_2.10</artifactId>

  • <version>${spark.version}</version>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming_2.10 -->

  • <dependency>

  • <groupId>org.apache.spark</groupId>

  • <artifactId>spark-streaming_2.10</artifactId>

  • <version>${spark.version}</version>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10 -->

  • <dependency>

  • <groupId>org.apache.spark</groupId>

  • <artifactId>spark-mllib_2.10</artifactId>

  • <version>${spark.version}</version>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 -->

  • <dependency>

  • <groupId>org.apache.spark</groupId>

  • <artifactId>spark-hive_2.10</artifactId>

  • <version>${spark.version}</version>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-graphx_2.10 -->

  • <dependency>

  • <groupId>org.apache.spark</groupId>

  • <artifactId>spark-graphx_2.10</artifactId>

  • <version>${spark.version}</version>

  • </dependency>

  • <dependency>

  • <groupId>mysql</groupId>

  • <artifactId>mysql-connector-java</artifactId>

  • <version>5.1.30</version>

  • </dependency>

  • <!--<dependency>-->

  • <!--<groupId>org.spark-project.akka</groupId>-->

  • <!--<artifactId>akka-actor_2.10</artifactId>-->

  • <!--<version>2.3.4-spark</version>-->

  • <!--</dependency>-->

  • <!--<dependency>-->

  • <!--<groupId>org.spark-project.akka</groupId>-->

  • <!--<artifactId>akka-remote_2.10</artifactId>-->

  • <!--<version>2.3.4-spark</version>-->

  • <!--</dependency>-->

  • <dependency>

  • <groupId>com.google.guava</groupId>

  • <artifactId>guava</artifactId>

  • <version>14.0.1</version>

  • </dependency>

  • <dependency>

  • <groupId>org.apache.hadoop</groupId>

  • <artifactId>hadoop-common</artifactId>

  • <version>2.6.0</version>

  • </dependency>

  • <dependency>

  • <groupId>org.apache.hadoop</groupId>

  • <artifactId>hadoop-client</artifactId>

  • <version>2.6.0</version>

  • </dependency>

  • <dependency>

  • <groupId>org.apache.spark</groupId>

  • <artifactId>spark-hive_2.10</artifactId>

  • <version>${spark.version}</version>

  • </dependency>

  • <dependency>

  • <groupId>com.alibaba</groupId>

  • <artifactId>fastjson</artifactId>

  • <version>1.2.3</version>

  • </dependency>

  • <dependency>

  • <groupId>p6spy</groupId>

  • <artifactId>p6spy</artifactId>

  • <version>1.3</version>

  • </dependency>

  • <dependency>

  • <groupId>org.apache.commons</groupId>

  • <artifactId>commons-math3</artifactId>

  • <version>3.3</version>

  • </dependency>

  • ?
  • <dependency>

  • <groupId>org.jdom</groupId>

  • <artifactId>jdom</artifactId>

  • <version>2.0.2</version>

  • </dependency>

  • ?
  • <dependency>

  • <groupId>com.google.guava</groupId>

  • <artifactId>guava</artifactId>

  • <version>14.0.1</version>

  • </dependency>

  • <dependency>

  • <groupId>org.apache.hadoop</groupId>

  • <artifactId>hadoop-common</artifactId>

  • <version>2.6.0</version>

  • </dependency>

  • <dependency>

  • <groupId>org.apache.hadoop</groupId>

  • <artifactId>hadoop-hdfs</artifactId>

  • <version>2.6.0</version>

  • </dependency>

  • <dependency>

  • <groupId>redis.clients</groupId>

  • <artifactId>jedis</artifactId>

  • <version>2.6.0</version>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->

  • <dependency>

  • <groupId>org.apache.hbase</groupId>

  • <artifactId>hbase-client</artifactId>

  • <version>0.98.6-hadoop2</version>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->

  • <dependency>

  • <groupId>org.apache.hbase</groupId>

  • <artifactId>hbase</artifactId>

  • <version>0.98.6-hadoop2</version>

  • <type>pom</type>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-common -->

  • <dependency>

  • <groupId>org.apache.hbase</groupId>

  • <artifactId>hbase-common</artifactId>

  • <version>0.98.6-hadoop2</version>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server -->

  • <dependency>

  • <groupId>org.apache.hbase</groupId>

  • <artifactId>hbase-server</artifactId>

  • <version>0.98.6-hadoop2</version>

  • </dependency>

  • <dependency>

  • <groupId>org.testng</groupId>

  • <artifactId>testng</artifactId>

  • <version>6.8.8</version>

  • <scope>test</scope>

  • </dependency>

  • <dependency>

  • <groupId>mysql</groupId>

  • <artifactId>mysql-connector-java</artifactId>

  • <version>5.1.30</version>

  • </dependency>

  • <dependency>

  • <groupId>com.fasterxml.jackson.jaxrs</groupId>

  • <artifactId>jackson-jaxrs-json-provider</artifactId>

  • <version>2.4.4</version>

  • </dependency>

  • <dependency>

  • <groupId>com.fasterxml.jackson.core</groupId>

  • <artifactId>jackson-databind</artifactId>

  • <version>2.4.4</version>

  • </dependency>

  • <dependency>

  • <groupId>net.sf.json-lib</groupId>

  • <artifactId>json-lib</artifactId>

  • <version>2.4</version>

  • <classifier>jdk15</classifier>

  • </dependency>

  • <!-- https://mvnrepository.com/artifact/javax.mail/javax.mail-api -->

  • <dependency>

  • <groupId>javax.mail</groupId>

  • <artifactId>javax.mail-api</artifactId>

  • <version>1.4.7</version>

  • </dependency>

  • ?
  • <dependency>

  • <groupId>junit</groupId>

  • <artifactId>junit</artifactId>

  • <version>3.8.1</version>

  • <scope>test</scope>

  • </dependency>

  • </dependencies>

  • ?
  • <build>

  • <plugins>

  • <!--<打包后的項目必須spark submit方式提交給spark運行,勿使用java -jar運行java包>-->

  • <plugin>

  • <artifactId>maven-assembly-plugin</artifactId>

  • <configuration>

  • <appendAssemblyId>false</appendAssemblyId>

  • <descriptorRefs>

  • <descriptorRef>jar-with-dependencies</descriptorRef>

  • </descriptorRefs>

  • <archive>

  • <manifest>

  • <mainClass>rrkd.dt.sparkTest.HelloWorld</mainClass>

  • </manifest>

  • </archive>

  • </configuration>

  • <executions>

  • <execution>

  • <id>make-assembly</id>

  • <phase>package</phase>

  • <goals>

  • <goal>assembly</goal>

  • </goals>

  • </execution>

  • </executions>

  • </plugin>

  • <plugin>

  • <groupId>org.apache.maven.plugins</groupId>

  • <artifactId>maven-compiler-plugin</artifactId>

  • <version>3.1</version>

  • <configuration>

  • <source>${jdk.version}</source>

  • <target>${jdk.version}</target>

  • <encoding>${project.build.sourceEncoding}</encoding>

  • </configuration>

  • </plugin>

  • <plugin>

  • <groupId>org.apache.maven.plugins</groupId>

  • <artifactId>maven-shade-plugin</artifactId>

  • <version>2.1</version>

  • <configuration>

  • <createDependencyReducedPom>false</createDependencyReducedPom>

  • </configuration>

  • <executions>

  • <execution>

  • <phase>package</phase>

  • <goals>

  • <goal>shade</goal>

  • </goals>

  • <configuration>

  • <shadedArtifactAttached>true</shadedArtifactAttached>

  • <shadedClassifierName>allinone</shadedClassifierName>

  • <artifactSet>

  • <includes>

  • <include>*:*</include>

  • </includes>

  • </artifactSet>

  • ?
  • <filters>

  • <filter>

  • <artifact>*:*</artifact>

  • <excludes>

  • <exclude>META-INF/*.SF</exclude>

  • <exclude>META-INF/*.DSA</exclude>

  • <exclude>META-INF/*.RSA</exclude>

  • </excludes>

  • </filter>

  • </filters>

  • ?
  • <transformers>

  • <transformer

  • implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">

  • <resource>reference.conf</resource>

  • </transformer>

  • <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">

  • <mainClass>rrkd.dt.sparkTest.HelloWorld</mainClass>

  • </transformer>

  • </transformers>

  • </configuration>

  • </execution>

  • </executions>

  • </plugin>

  • ?
  • <!--< build circular dependencies between Java and Scala>-->

  • <plugin>

  • <groupId>net.alchim31.maven</groupId>

  • <artifactId>scala-maven-plugin</artifactId>

  • <version>3.2.0</version>

  • <executions>

  • <execution>

  • <id>compile-scala</id>

  • <phase>compile</phase>

  • <goals>

  • <goal>add-source</goal>

  • <goal>compile</goal>

  • </goals>

  • </execution>

  • <execution>

  • <id>test-compile-scala</id>

  • <phase>test-compile</phase>

  • <goals>

  • <goal>add-source</goal>

  • <goal>testCompile</goal>

  • </goals>

  • </execution>

  • </executions>

  • <configuration>

  • <scalaVersion>${scala.version}</scalaVersion>

  • </configuration>

  • </plugin>

  • ?
  • </plugins>

  • ?
  • </build>

  • ?
  • </project>

  • 主要是build部分的配置,其它的毋須過多關注。

    ?

    項目的目錄結(jié)構(gòu),大體跟maven的默認約定一樣,只是src下多了一個scala目錄,主要還是為了便于組織java源碼和scala源碼,如下圖:

    ?

    ?

    在java目錄下建立HelloWorld類HelloWorld.class:

    ?
  • package test;

  • ?
  • import test.Hello;

  • /**

  • * Created by L on 2017/1/5.

  • */

  • public class HelloWorld {

  • ?
  • public static void main(String[] args){

  • System.out.print("test");

  • Hello.sayHello("scala");

  • Hello.runSpark();

  • }

  • }

  • ?

    在scala目錄下建立hello類hello.scala:

    ?
  • package test

  • ?
  • import org.apache.spark.graphx.{Graph, Edge, VertexId, GraphLoader}

  • import org.apache.spark.rdd.RDD

  • import org.apache.spark.{SparkContext, SparkConf}

  • import breeze.linalg.{Vector, DenseVector, squaredDistance}

  • /**

  • * Created by L on 2017/1/5.

  • */

  • object Hello {

  • def sayHello(x: String): Unit = {

  • println("hello," + x);

  • }

  • ?
  • // def main(args: Array[String]) {

  • def runSpark() {

  • val sparkConf = new SparkConf().setAppName("SparkKMeans").setMaster("local[*]")

  • val sc = new SparkContext(sparkConf)

  • // Create an RDD for the vertices

  • val users: RDD[(VertexId, (String, String))] =

  • sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),

  • (5L, ("franklin", "prof")), (2L, ("istoica", "prof")),

  • (4L, ("peter", "student"))))

  • // Create an RDD for edges

  • val relationships: RDD[Edge[String]] =

  • sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),

  • Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"),

  • Edge(4L, 0L, "student"), Edge(5L, 0L, "colleague")))

  • // Define a default user in case there are relationship with missing user

  • val defaultUser = ("John Doe", "Missing")

  • // Build the initial Graph

  • val graph = Graph(users, relationships, defaultUser)

  • // Notice that there is a user 0 (for which we have no information) connected to users

  • // 4 (peter) and 5 (franklin).

  • graph.triplets.map(

  • triplet => triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1

  • ).collect.foreach(println(_))

  • // Remove missing vertices as well as the edges to connected to them

  • val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing")

  • // The valid subgraph will disconnect users 4 and 5 by removing user 0

  • validGraph.vertices.collect.foreach(println(_))

  • validGraph.triplets.map(

  • triplet => triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1

  • ).collect.foreach(println(_))

  • sc.stop()

  • }

  • }

  • 這樣子,在scala項目中調(diào)用spark的接口來運行一些spark應用,在java項目中再調(diào)用scala。

    4、scala項目maven的編譯打包

    java/scala混合的項目,怎么先編譯scala再編譯java,可以使用以下maven 命令來進行編譯打包:

    mvn clean scala:compile assembly:assembly

    5、spark項目的jar包的運行問題

    在開發(fā)時,我們可能會以local模式在IDEA中運行,然后使用了上面的命令進行打包。打包后的spark項目必須要放到spark集群下以spark-submit的方式提交運行。

    --------------------- 本文來自 大愚若智_ 的CSDN 博客 ,全文地址請點擊:https://blog.csdn.net/zbc1090549839/article/details/54290233?utm_source=copy

    總結(jié)

    以上是生活随笔為你收集整理的maven环境下使用java、scala混合开发spark应用的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。