Out Of Memory Error while reading 400 thousand rows in Spark SQL [duplicate]










0
















This question already has an answer here:



  • How to optimize partitioning when migrating data from JDBC source?

    3 answers



I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded. I am using PySpark with RAM of 8GB.



Below is the code



import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
dbtable="table_name",
user="user",
password="password",
driver="org.postgresql.Driver").load()


I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.



Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded

2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job


My end goal is to do some processing on large database tables using spark. Any help would be great.










share|improve this question















marked as duplicate by user6910411, eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 9:48


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















  • Post your code, it will help us understand exactly what you are trying to do.

    – Amar Gajbhiye
    Nov 13 '18 at 5:57











  • @AmarGajbhiye i added code please have a look

    – Naresh
    Nov 13 '18 at 8:12















0
















This question already has an answer here:



  • How to optimize partitioning when migrating data from JDBC source?

    3 answers



I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded. I am using PySpark with RAM of 8GB.



Below is the code



import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
dbtable="table_name",
user="user",
password="password",
driver="org.postgresql.Driver").load()


I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.



Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded

2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job


My end goal is to do some processing on large database tables using spark. Any help would be great.










share|improve this question















marked as duplicate by user6910411, eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 9:48


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















  • Post your code, it will help us understand exactly what you are trying to do.

    – Amar Gajbhiye
    Nov 13 '18 at 5:57











  • @AmarGajbhiye i added code please have a look

    – Naresh
    Nov 13 '18 at 8:12













0












0








0









This question already has an answer here:



  • How to optimize partitioning when migrating data from JDBC source?

    3 answers



I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded. I am using PySpark with RAM of 8GB.



Below is the code



import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
dbtable="table_name",
user="user",
password="password",
driver="org.postgresql.Driver").load()


I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.



Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded

2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job


My end goal is to do some processing on large database tables using spark. Any help would be great.










share|improve this question

















This question already has an answer here:



  • How to optimize partitioning when migrating data from JDBC source?

    3 answers



I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded. I am using PySpark with RAM of 8GB.



Below is the code



import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
dbtable="table_name",
user="user",
password="password",
driver="org.postgresql.Driver").load()


I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.



Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded

2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job


My end goal is to do some processing on large database tables using spark. Any help would be great.





This question already has an answer here:



  • How to optimize partitioning when migrating data from JDBC source?

    3 answers







python apache-spark pyspark






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 8:12







Naresh

















asked Nov 12 '18 at 17:24









NareshNaresh

5461419




5461419




marked as duplicate by user6910411, eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 9:48


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by user6910411, eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 9:48


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • Post your code, it will help us understand exactly what you are trying to do.

    – Amar Gajbhiye
    Nov 13 '18 at 5:57











  • @AmarGajbhiye i added code please have a look

    – Naresh
    Nov 13 '18 at 8:12

















  • Post your code, it will help us understand exactly what you are trying to do.

    – Amar Gajbhiye
    Nov 13 '18 at 5:57











  • @AmarGajbhiye i added code please have a look

    – Naresh
    Nov 13 '18 at 8:12
















Post your code, it will help us understand exactly what you are trying to do.

– Amar Gajbhiye
Nov 13 '18 at 5:57





Post your code, it will help us understand exactly what you are trying to do.

– Amar Gajbhiye
Nov 13 '18 at 5:57













@AmarGajbhiye i added code please have a look

– Naresh
Nov 13 '18 at 8:12





@AmarGajbhiye i added code please have a look

– Naresh
Nov 13 '18 at 8:12












2 Answers
2






active

oldest

votes


















0














I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory






share|improve this answer


















  • 1





    i added code..please have a look

    – Naresh
    Nov 13 '18 at 8:13











  • i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

    – LiJianing
    Nov 13 '18 at 11:32


















0














I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.



Kind regards



EDIT
As @LiJianing suggested, you can increase the spark executor memory.



from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)





share|improve this answer

























  • can you guide me how i can move data from Postgres to spark environment, any tutorial.

    – Naresh
    Nov 12 '18 at 17:34











  • You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

    – karma4917
    Nov 12 '18 at 17:50












  • Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

    – Manrique
    Nov 12 '18 at 17:52












  • 400K rows is nothing though.

    – thebluephantom
    Nov 12 '18 at 18:13











  • setting executer memory 8g also giving error

    – Naresh
    Nov 13 '18 at 11:08

















2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory






share|improve this answer


















  • 1





    i added code..please have a look

    – Naresh
    Nov 13 '18 at 8:13











  • i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

    – LiJianing
    Nov 13 '18 at 11:32















0














I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory






share|improve this answer


















  • 1





    i added code..please have a look

    – Naresh
    Nov 13 '18 at 8:13











  • i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

    – LiJianing
    Nov 13 '18 at 11:32













0












0








0







I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory






share|improve this answer













I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 13 '18 at 3:21









LiJianingLiJianing

373




373







  • 1





    i added code..please have a look

    – Naresh
    Nov 13 '18 at 8:13











  • i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

    – LiJianing
    Nov 13 '18 at 11:32












  • 1





    i added code..please have a look

    – Naresh
    Nov 13 '18 at 8:13











  • i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

    – LiJianing
    Nov 13 '18 at 11:32







1




1





i added code..please have a look

– Naresh
Nov 13 '18 at 8:13





i added code..please have a look

– Naresh
Nov 13 '18 at 8:13













i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

– LiJianing
Nov 13 '18 at 11:32





i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g

– LiJianing
Nov 13 '18 at 11:32













0














I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.



Kind regards



EDIT
As @LiJianing suggested, you can increase the spark executor memory.



from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)





share|improve this answer

























  • can you guide me how i can move data from Postgres to spark environment, any tutorial.

    – Naresh
    Nov 12 '18 at 17:34











  • You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

    – karma4917
    Nov 12 '18 at 17:50












  • Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

    – Manrique
    Nov 12 '18 at 17:52












  • 400K rows is nothing though.

    – thebluephantom
    Nov 12 '18 at 18:13











  • setting executer memory 8g also giving error

    – Naresh
    Nov 13 '18 at 11:08















0














I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.



Kind regards



EDIT
As @LiJianing suggested, you can increase the spark executor memory.



from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)





share|improve this answer

























  • can you guide me how i can move data from Postgres to spark environment, any tutorial.

    – Naresh
    Nov 12 '18 at 17:34











  • You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

    – karma4917
    Nov 12 '18 at 17:50












  • Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

    – Manrique
    Nov 12 '18 at 17:52












  • 400K rows is nothing though.

    – thebluephantom
    Nov 12 '18 at 18:13











  • setting executer memory 8g also giving error

    – Naresh
    Nov 13 '18 at 11:08













0












0








0







I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.



Kind regards



EDIT
As @LiJianing suggested, you can increase the spark executor memory.



from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)





share|improve this answer















I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.



Kind regards



EDIT
As @LiJianing suggested, you can increase the spark executor memory.



from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 '18 at 9:26

























answered Nov 12 '18 at 17:30









ManriqueManrique

522214




522214












  • can you guide me how i can move data from Postgres to spark environment, any tutorial.

    – Naresh
    Nov 12 '18 at 17:34











  • You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

    – karma4917
    Nov 12 '18 at 17:50












  • Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

    – Manrique
    Nov 12 '18 at 17:52












  • 400K rows is nothing though.

    – thebluephantom
    Nov 12 '18 at 18:13











  • setting executer memory 8g also giving error

    – Naresh
    Nov 13 '18 at 11:08

















  • can you guide me how i can move data from Postgres to spark environment, any tutorial.

    – Naresh
    Nov 12 '18 at 17:34











  • You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

    – karma4917
    Nov 12 '18 at 17:50












  • Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

    – Manrique
    Nov 12 '18 at 17:52












  • 400K rows is nothing though.

    – thebluephantom
    Nov 12 '18 at 18:13











  • setting executer memory 8g also giving error

    – Naresh
    Nov 13 '18 at 11:08
















can you guide me how i can move data from Postgres to spark environment, any tutorial.

– Naresh
Nov 12 '18 at 17:34





can you guide me how i can move data from Postgres to spark environment, any tutorial.

– Naresh
Nov 12 '18 at 17:34













You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

– karma4917
Nov 12 '18 at 17:50






You can move the data from Postgres to Hive using Sqoop and process it using Spark, right? You can, community.hortonworks.com/articles/14802/…

– karma4917
Nov 12 '18 at 17:50














Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

– Manrique
Nov 12 '18 at 17:52






Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…

– Manrique
Nov 12 '18 at 17:52














400K rows is nothing though.

– thebluephantom
Nov 12 '18 at 18:13





400K rows is nothing though.

– thebluephantom
Nov 12 '18 at 18:13













setting executer memory 8g also giving error

– Naresh
Nov 13 '18 at 11:08





setting executer memory 8g also giving error

– Naresh
Nov 13 '18 at 11:08



Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌