Out Of Memory Error while reading 400 thousand rows in Spark SQL [duplicate]

This question already has an answer here:

How to optimize partitioning when migrating data from JDBC source?

3 answers

I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded. I am using PySpark with RAM of 8GB.

Below is the code

import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
 dbtable="table_name",
 user="user",
 password="password",
 driver="org.postgresql.Driver").load()

I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.

Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
 at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
 at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
 at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
 ... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded

2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job

My end goal is to do some processing on large database tables using spark. Any help would be great.

edited Nov 13 '18 at 8:12

asked Nov 12 '18 at 17:24

Naresh

5461419

marked as duplicate by user6910411, eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 9:48

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Post your code, it will help us understand exactly what you are trying to do.

– Amar Gajbhiye
Nov 13 '18 at 5:57

@AmarGajbhiye i added code please have a look

– Naresh
Nov 13 '18 at 8:12

add a comment |

This question already has an answer here:

How to optimize partitioning when migrating data from JDBC source?

3 answers

I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded. I am using PySpark with RAM of 8GB.

Below is the code

import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
 dbtable="table_name",
 user="user",
 password="password",
 driver="org.postgresql.Driver").load()

I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.

Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
 at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
 at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
 at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
 ... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded

2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job

My end goal is to do some processing on large database tables using spark. Any help would be great.

edited Nov 13 '18 at 8:12

asked Nov 12 '18 at 17:24

Naresh

5461419

marked as duplicate by user6910411, eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 9:48

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Post your code, it will help us understand exactly what you are trying to do.

– Amar Gajbhiye
Nov 13 '18 at 5:57

@AmarGajbhiye i added code please have a look

– Naresh
Nov 13 '18 at 8:12

add a comment |

This question already has an answer here:

How to optimize partitioning when migrating data from JDBC source?

3 answers

I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded. I am using PySpark with RAM of 8GB.

Below is the code

import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
 dbtable="table_name",
 user="user",
 password="password",
 driver="org.postgresql.Driver").load()

I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.

Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
 at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
 at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
 at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
 ... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded

2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job

My end goal is to do some processing on large database tables using spark. Any help would be great.

edited Nov 13 '18 at 8:12

asked Nov 12 '18 at 17:24

Naresh

5461419

This question already has an answer here:

How to optimize partitioning when migrating data from JDBC source?

3 answers

I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded. I am using PySpark with RAM of 8GB.

Below is the code

import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
 dbtable="table_name",
 user="user",
 password="password",
 driver="org.postgresql.Driver").load()

I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.

Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
 at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
 at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
 at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
 at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
 at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
 ... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded

2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job

My end goal is to do some processing on large database tables using spark. Any help would be great.

This question already has an answer here:

How to optimize partitioning when migrating data from JDBC source?

3 answers

python apache-spark pyspark

edited Nov 13 '18 at 8:12

asked Nov 12 '18 at 17:24

Naresh

5461419

edited Nov 13 '18 at 8:12

asked Nov 12 '18 at 17:24

Naresh

5461419

edited Nov 13 '18 at 8:12

asked Nov 12 '18 at 17:24

Naresh

5461419

asked Nov 12 '18 at 17:24

Naresh

5461419

asked Nov 12 '18 at 17:24

Naresh

5461419

marked as duplicate by user6910411, eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 9:48

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by user6910411, eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 9:48

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

Post your code, it will help us understand exactly what you are trying to do.

– Amar Gajbhiye
Nov 13 '18 at 5:57

@AmarGajbhiye i added code please have a look

– Naresh
Nov 13 '18 at 8:12

add a comment |

Post your code, it will help us understand exactly what you are trying to do.

– Amar Gajbhiye
Nov 13 '18 at 5:57

@AmarGajbhiye i added code please have a look

– Naresh
Nov 13 '18 at 8:12

Post your code, it will help us understand exactly what you are trying to do.

– Amar Gajbhiye
Nov 13 '18 at 5:57

@AmarGajbhiye i added code please have a look

– Naresh
Nov 13 '18 at 8:12

add a comment |

2 Answers
2

active

oldest

votes

I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory

answered Nov 13 '18 at 3:21

LiJianing

373

1

i added code..please have a look

– Naresh
Nov 13 '18 at 8:13

i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

– LiJianing
Nov 13 '18 at 11:32

add a comment |

I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.

Kind regards

EDIT
As @LiJianing suggested, you can increase the spark executor memory.

from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)

edited Nov 13 '18 at 9:26

answered Nov 12 '18 at 17:30

Manrique

522214

can you guide me how i can move data from Postgres to spark environment, any tutorial.

– Naresh
Nov 12 '18 at 17:34

You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

– karma4917
Nov 12 '18 at 17:50

Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

– Manrique
Nov 12 '18 at 17:52

400K rows is nothing though.

– thebluephantom
Nov 12 '18 at 18:13

setting executer memory 8g also giving error

– Naresh
Nov 13 '18 at 11:08

|
show 2 more comments

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory

answered Nov 13 '18 at 3:21

LiJianing

373

1

i added code..please have a look

– Naresh
Nov 13 '18 at 8:13

i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

– LiJianing
Nov 13 '18 at 11:32

add a comment |

I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory

answered Nov 13 '18 at 3:21

LiJianing

373

1

i added code..please have a look

– Naresh
Nov 13 '18 at 8:13

i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

– LiJianing
Nov 13 '18 at 11:32

add a comment |

I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory

answered Nov 13 '18 at 3:21

LiJianing

373

I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory

answered Nov 13 '18 at 3:21

LiJianing

373

answered Nov 13 '18 at 3:21

LiJianing

373

answered Nov 13 '18 at 3:21

LiJianing

373

answered Nov 13 '18 at 3:21

LiJianing

373

1

i added code..please have a look

– Naresh
Nov 13 '18 at 8:13

i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

– LiJianing
Nov 13 '18 at 11:32

add a comment |

1

i added code..please have a look

– Naresh
Nov 13 '18 at 8:13

i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

– LiJianing
Nov 13 '18 at 11:32

i added code..please have a look

– Naresh
Nov 13 '18 at 8:13

i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

– LiJianing
Nov 13 '18 at 11:32

add a comment |

Kind regards

EDIT
As @LiJianing suggested, you can increase the spark executor memory.

from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)

edited Nov 13 '18 at 9:26

answered Nov 12 '18 at 17:30

Manrique

522214

can you guide me how i can move data from Postgres to spark environment, any tutorial.

– Naresh
Nov 12 '18 at 17:34

You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

– karma4917
Nov 12 '18 at 17:50

Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

– Manrique
Nov 12 '18 at 17:52

400K rows is nothing though.

– thebluephantom
Nov 12 '18 at 18:13

setting executer memory 8g also giving error

– Naresh
Nov 13 '18 at 11:08

|
show 2 more comments

Kind regards

EDIT
As @LiJianing suggested, you can increase the spark executor memory.

from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)

edited Nov 13 '18 at 9:26

answered Nov 12 '18 at 17:30

Manrique

522214

can you guide me how i can move data from Postgres to spark environment, any tutorial.

– Naresh
Nov 12 '18 at 17:34

You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

– karma4917
Nov 12 '18 at 17:50

Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

– Manrique
Nov 12 '18 at 17:52

400K rows is nothing though.

– thebluephantom
Nov 12 '18 at 18:13

setting executer memory 8g also giving error

– Naresh
Nov 13 '18 at 11:08

|
show 2 more comments

Kind regards

EDIT
As @LiJianing suggested, you can increase the spark executor memory.

from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)

edited Nov 13 '18 at 9:26

answered Nov 12 '18 at 17:30

Manrique

522214

Kind regards

EDIT
As @LiJianing suggested, you can increase the spark executor memory.

from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)

edited Nov 13 '18 at 9:26

answered Nov 12 '18 at 17:30

Manrique

522214

edited Nov 13 '18 at 9:26

answered Nov 12 '18 at 17:30

Manrique

522214

answered Nov 12 '18 at 17:30

Manrique

522214

answered Nov 12 '18 at 17:30

Manrique

522214

can you guide me how i can move data from Postgres to spark environment, any tutorial.

– Naresh
Nov 12 '18 at 17:34

You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

– karma4917
Nov 12 '18 at 17:50

Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

– Manrique
Nov 12 '18 at 17:52

400K rows is nothing though.

– thebluephantom
Nov 12 '18 at 18:13

setting executer memory 8g also giving error

– Naresh
Nov 13 '18 at 11:08

|
show 2 more comments

can you guide me how i can move data from Postgres to spark environment, any tutorial.

– Naresh
Nov 12 '18 at 17:34

You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

– karma4917
Nov 12 '18 at 17:50

Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

– Manrique
Nov 12 '18 at 17:52

400K rows is nothing though.

– thebluephantom
Nov 12 '18 at 18:13

setting executer memory 8g also giving error

– Naresh
Nov 13 '18 at 11:08

can you guide me how i can move data from Postgres to spark environment, any tutorial.

– Naresh
Nov 12 '18 at 17:34

You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

– karma4917
Nov 12 '18 at 17:50

Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

– Manrique
Nov 12 '18 at 17:52

400K rows is nothing though.

– thebluephantom
Nov 12 '18 at 18:13

setting executer memory 8g also giving error

– Naresh
Nov 13 '18 at 11:08

|
show 2 more comments

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt