我们经常使用Spark on yarn的模式进行开发和任务调度,但是常常会出现各种错误。
本文将这些问题汇总并提出解决:
先贴一个spark提交任务到yarn的脚本:
nohup /data_dev/software/spark-2.2.0-bin-2.6.0-cdh5.14.0/bin/spark-submit \ --class log_analysis.Ktr_Log \ --master yarn \ --deploy-mode cluster \ --driver-memory 1g \ --executor-memory 2g \ --executor-cores 2 \ /data_dev/software/kettle_log_analysis.jar \ > /data_dev/software/logs/SparkTest.log 2>&1 &
1、spark找不到main类:
【ERROR yarn.ApplicationMaster: Uncaught exception: java.lang.ClassNotFoundException】
20/06/29 09:24:21 ERROR yarn.ApplicationMaster: Uncaught exception: java.lang.ClassNotFoundException: src/main/scala/log_analysis/Ktr_Log at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:629) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:394) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 20/06/29 09:24:21 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.ClassNotFoundException: src/main/scala/log_analysis/Ktr_Log) 20/06/29 09:24:21 INFO util.ShutdownHookManager: Shutdown hook called
解决:
该问题抛出找不到main类,是因为代码中的master在提交到yarn的时候,没有把.master(“local[20]”)注释掉。导致提交不成功。解决如下:
lazy val spark = SparkSession.builder() // .master("local[20]") .appName("ReadKettleKjb") .config("spark.debug.maxToStringFields", "400") .getOrCreate()
2、找不到读取的文件:
【ERROR yarn.ApplicationMaster: User class threw exception: java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at log_analysis.Ktr_Log.main(Ktr_Log.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://192.168.21.30:8020/data_dev/software/kettle_dev_1.log】
20/06/29 09:31:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.ExceptionInInitializerError java.lang.ExceptionInInitializerError at log_analysis.Ktr_Log.main(Ktr_Log.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://192.168.21.30:8020/data_dev/software/kettle_dev_1.log at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.ZippedWithIndexRDD.<init>(ZippedWithIndexRDD.scala:44) at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1294) at org.apache.spark.rdd.RDD$$anonfun$zipWithIndex$1.apply(RDD.scala:1294) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.zipWithIndex(RDD.scala:1293) at utils.DataFrameUtils$.withLineNOColumn(DataFrameUtils.scala:24) at log_analysis.Ktr_Log$.<init>(Ktr_Log.scala:60) at log_analysis.Ktr_Log$.<clinit>(Ktr_Log.scala) ... 6 more
解决:
指定服务器文件的时候,要指定绝对路径。
如果是读取hdfs文件,需要检查服务器ip和端口号、文件路径等。
3、找不到对应的jar包依赖:
【java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:549)】
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:549) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:301) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156) at log_analysis.ReadKettleKjb$.analysis_kjb(ReadKettleKjb.scala:117) at log_analysis.ReadKettleKjb$.get_kjb_result(ReadKettleKjb.scala:41) at log_analysis.Ktr_Log$.main(Ktr_Log.scala:77) at log_analysis.Ktr_Log.main(Ktr_Log.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) Caused by: java.lang.ClassNotFoundException: com.databricks.spark.xml.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21$$anonfun$apply$12.apply(DataSource.scala:533) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$21.apply(DataSource.scala:533) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533) ... 14 more
由于我在代码中指定了spark读取xml的文件方式,是用databricks下面的依赖包,故需要将依赖的jar包同时在idea打包过程中加入。
val df = spark.sqlContext .read .format("com.databricks.spark.xml") .option("rowTag", "book") .schema(customSchema) .load("data/books_complex.xml")
4、没有指定master地址
【WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 14, node01, executor 1): java.lang.ExceptionInInitializerError
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)】
20/06/29 09:54:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 14, node01, executor 1): java.lang.ExceptionInInitializerError at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:73) at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:64) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.<init>(SparkContext.scala:376) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901) at log_analysis.ReadKettleKjb$.spark$lzycompute(ReadKettleKjb.scala:27) at log_analysis.ReadKettleKjb$.spark(ReadKettleKjb.scala:23) at log_analysis.ReadKettleKjb$.context$lzycompute(ReadKettleKjb.scala:29) at log_analysis.ReadKettleKjb$.context(ReadKettleKjb.scala:29) at log_analysis.ReadKettleKjb$.<init>(ReadKettleKjb.scala:30) at log_analysis.ReadKettleKjb$.<clinit>(ReadKettleKjb.scala)
报错说在sparkContext对象初始化的时候必须设置一个master URL。但是我明明在提交应用的时候设置了–master呀,为什么说我没有?
其实这是初学者很容易犯的错误,原因在于没有真正理解spark分布式或伪分布式的运行原理。出错的小伙伴往往把创建spark实例,或者sc.textFile读取数据等放在了main函数的外面。
如果检查代码中没有指定master,需要特别注意的是,创建sparkSession对象必须在main方法里面,否则driver无法分发。
在伪分布式中,一个spark 应用对应了一个main函数,放在一个driver里,driver里有一个对应的实例(spark context).driver 负责向各个节点分发资源以及数据。那么如果你把创建实例放在了main函数的外面,driver就没法分发了。所以如果这样写在local模式下是可以成功的,在分布式就会报错。
正确姿势如下:
def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .config("spark.sql.shuffle.partitions", 300) // .master("yarn-cluster") // .master("local[10]") .appName("ktr") // .enableHiveSupport() .getOrCreate() val context = spark.sparkContext context.setLogLevel("WARN") xxx...(业务代码) }
4、scala环境没有生效
【WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 14, node01, executor 2): java.lang.ExceptionInInitializerError
Caused by: java.lang.IllegalStateException: Library directory ‘/data_dev/software/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas/nm-local-dir/usercache/root/appcache/application_1593315523377_0017/container_1593315523377_0017_01_000003/assembly/target/scala-2.11/jars’ does not exist; make sure Spark is built.
at …】
20/06/29 10:15:20 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 14, node01, executor 2): java.lang.ExceptionInInitializerError at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:74) at log_analysis.ReadKettleKjb$$anonfun$get_kjb_result$1.apply(ReadKettleKjb.scala:65) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: Library directory '/data_dev/software/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas/nm-local-dir/usercache/root/appcache/application_1593315523377_0017/container_1593315523377_0017_01_000003/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built. at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248) at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:347) at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:526) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:814) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:169) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901) at log_analysis.ReadKettleKjb$.spark$lzycompute(ReadKettleKjb.scala:28) at log_analysis.ReadKettleKjb$.spark(ReadKettleKjb.scala:23) at log_analysis.ReadKettleKjb$.context$lzycompute(ReadKettleKjb.scala:30) at log_analysis.ReadKettleKjb$.context(ReadKettleKjb.scala:30) at log_analysis.ReadKettleKjb$.<init>(ReadKettleKjb.scala:31) at log_analysis.ReadKettleKjb$.<clinit>(ReadKettleKjb.scala) ... 14 more
该错误提示没有scala相关的jar包,此时应该检查服务器上是否安装了scala的sdk:
[root@node01 XXF_EDW_SS_1]# scala -version Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL
显示以上版本说明scala安装成功,且环境变量配置生效。
如果没有安装,在idea对代码进行打包的时候,应该将scala相关的jar包集成到目标jar包中,如下:
5、Exception from container-launch.Exit code: 1 Stack trace: ExitCodeException exitCode=1:
Yarn错误日志如下:
yarn主界面状态显示failed,显示如下: Diagnostics: Exception from container-launch. Container id: container_1574829788169_0011_02_000001 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at org.apache.hadoop.util.Shell.run(Shell.java:507) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application.
但是在yarn界面点进去查看job日志详情,发现状态为successed。
查看很多文档都不能解决这个问题,最后查看代码,发现:
SparkConf sparkConf = new SparkConf() // .setMaster("local[2]") .setAppName("javaSparkWordcount");
将setmaster(“local[2]”)注释掉,完美解决。
转自:https://blog.csdn.net/u010051036/article/details/107017429