Out Of Memory Error while reading 400 thousand rows in Spark SQL [duplicate]
This question already has an answer here:
How to optimize partitioning when migrating data from JDBC source?
3 answers
I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded
. I am using PySpark with RAM of 8GB.
Below is the code
import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
dbtable="table_name",
user="user",
password="password",
driver="org.postgresql.Driver").load()
I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.
Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job
My end goal is to do some processing on large database tables using spark. Any help would be great.
python apache-spark pyspark
marked as duplicate by user6910411, eliasah
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 13 '18 at 9:48
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
How to optimize partitioning when migrating data from JDBC source?
3 answers
I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded
. I am using PySpark with RAM of 8GB.
Below is the code
import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
dbtable="table_name",
user="user",
password="password",
driver="org.postgresql.Driver").load()
I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.
Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job
My end goal is to do some processing on large database tables using spark. Any help would be great.
python apache-spark pyspark
marked as duplicate by user6910411, eliasah
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 13 '18 at 9:48
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Post your code, it will help us understand exactly what you are trying to do.
– Amar Gajbhiye
Nov 13 '18 at 5:57
@AmarGajbhiye i added code please have a look
– Naresh
Nov 13 '18 at 8:12
add a comment |
This question already has an answer here:
How to optimize partitioning when migrating data from JDBC source?
3 answers
I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded
. I am using PySpark with RAM of 8GB.
Below is the code
import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
dbtable="table_name",
user="user",
password="password",
driver="org.postgresql.Driver").load()
I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.
Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job
My end goal is to do some processing on large database tables using spark. Any help would be great.
python apache-spark pyspark
This question already has an answer here:
How to optimize partitioning when migrating data from JDBC source?
3 answers
I have some data on postgres and trying to read that data on spark dataframe but i get error java.lang.OutOfMemoryError: GC overhead limit exceeded
. I am using PySpark with RAM of 8GB.
Below is the code
import findspark
findspark.init()
from pyspark import SparkContext, SQLContext
sc = SparkContext()
sql_context = SQLContext(sc)
temp_df = sql_context.read.format('jdbc').options(url="jdbc:postgresql://localhost:5432/database",
dbtable="table_name",
user="user",
password="password",
driver="org.postgresql.Driver").load()
I very new to world of spark. I tried same with python pandas which worked without any issue but with spark i got error.
Exception in thread "refresh progress" java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.VectorBuilder.<init>(Vector.scala:713)
at scala.collection.immutable.Vector$.newBuilder(Vector.scala:22)
at scala.collection.immutable.IndexedSeq$.newBuilder(IndexedSeq.scala:46)
at scala.collection.generic.GenericTraversableTemplate$class.genericBuilder(GenericTraversableTemplate.scala:70)
at scala.collection.AbstractTraversable.genericBuilder(Traversable.scala:104)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:89)
at org.apache.spark.ui.ConsoleProgressBar$$anonfun$3.apply(ConsoleProgressBar.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:82)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:56)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:1648)
at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$1.run(BlockManager.scala:1615)
2018-11-12 21:48:16 WARN Executor:87 - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 14 more
2018-11-12 21:48:16 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR SparkUncaughtExceptionHandler:91 - Uncaught exception in thread Thread[Executor task launch worker for task 0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.OutOfMemoryError: GC overhead limit exceeded
2018-11-12 21:48:16 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job
My end goal is to do some processing on large database tables using spark. Any help would be great.
This question already has an answer here:
How to optimize partitioning when migrating data from JDBC source?
3 answers
python apache-spark pyspark
python apache-spark pyspark
edited Nov 13 '18 at 8:12
Naresh
asked Nov 12 '18 at 17:24
NareshNaresh
5461419
5461419
marked as duplicate by user6910411, eliasah
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 13 '18 at 9:48
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by user6910411, eliasah
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 13 '18 at 9:48
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Post your code, it will help us understand exactly what you are trying to do.
– Amar Gajbhiye
Nov 13 '18 at 5:57
@AmarGajbhiye i added code please have a look
– Naresh
Nov 13 '18 at 8:12
add a comment |
Post your code, it will help us understand exactly what you are trying to do.
– Amar Gajbhiye
Nov 13 '18 at 5:57
@AmarGajbhiye i added code please have a look
– Naresh
Nov 13 '18 at 8:12
Post your code, it will help us understand exactly what you are trying to do.
– Amar Gajbhiye
Nov 13 '18 at 5:57
Post your code, it will help us understand exactly what you are trying to do.
– Amar Gajbhiye
Nov 13 '18 at 5:57
@AmarGajbhiye i added code please have a look
– Naresh
Nov 13 '18 at 8:12
@AmarGajbhiye i added code please have a look
– Naresh
Nov 13 '18 at 8:12
add a comment |
2 Answers
2
active
oldest
votes
I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory
1
i added code..please have a look
– Naresh
Nov 13 '18 at 8:13
i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g
– LiJianing
Nov 13 '18 at 11:32
add a comment |
I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.
Kind regards
EDIT
As @LiJianing suggested, you can increase the spark executor memory.
from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)
can you guide me how i can move data from Postgres to spark environment, any tutorial.
– Naresh
Nov 12 '18 at 17:34
You can move the data fromPostgres
toHive
usingSqoop
and process it usingSpark
, right? You can, community.hortonworks.com/articles/14802/…
– karma4917
Nov 12 '18 at 17:50
Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…
– Manrique
Nov 12 '18 at 17:52
400K rows is nothing though.
– thebluephantom
Nov 12 '18 at 18:13
setting executer memory 8g also giving error
– Naresh
Nov 13 '18 at 11:08
|
show 2 more comments
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory
1
i added code..please have a look
– Naresh
Nov 13 '18 at 8:13
i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g
– LiJianing
Nov 13 '18 at 11:32
add a comment |
I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory
1
i added code..please have a look
– Naresh
Nov 13 '18 at 8:13
i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g
– LiJianing
Nov 13 '18 at 11:32
add a comment |
I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory
I didn't see your code, but just increase the memory of executor, eg. spark.python.worker.memory
answered Nov 13 '18 at 3:21
LiJianingLiJianing
373
373
1
i added code..please have a look
– Naresh
Nov 13 '18 at 8:13
i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g
– LiJianing
Nov 13 '18 at 11:32
add a comment |
1
i added code..please have a look
– Naresh
Nov 13 '18 at 8:13
i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g
– LiJianing
Nov 13 '18 at 11:32
1
1
i added code..please have a look
– Naresh
Nov 13 '18 at 8:13
i added code..please have a look
– Naresh
Nov 13 '18 at 8:13
i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g
– LiJianing
Nov 13 '18 at 11:32
i am not very familar with pyspark, but i think you'd better try to set spark.python.worker.memory=6g
– LiJianing
Nov 13 '18 at 11:32
add a comment |
I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.
Kind regards
EDIT
As @LiJianing suggested, you can increase the spark executor memory.
from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)
can you guide me how i can move data from Postgres to spark environment, any tutorial.
– Naresh
Nov 12 '18 at 17:34
You can move the data fromPostgres
toHive
usingSqoop
and process it usingSpark
, right? You can, community.hortonworks.com/articles/14802/…
– karma4917
Nov 12 '18 at 17:50
Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…
– Manrique
Nov 12 '18 at 17:52
400K rows is nothing though.
– thebluephantom
Nov 12 '18 at 18:13
setting executer memory 8g also giving error
– Naresh
Nov 13 '18 at 11:08
|
show 2 more comments
I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.
Kind regards
EDIT
As @LiJianing suggested, you can increase the spark executor memory.
from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)
can you guide me how i can move data from Postgres to spark environment, any tutorial.
– Naresh
Nov 12 '18 at 17:34
You can move the data fromPostgres
toHive
usingSqoop
and process it usingSpark
, right? You can, community.hortonworks.com/articles/14802/…
– karma4917
Nov 12 '18 at 17:50
Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…
– Manrique
Nov 12 '18 at 17:52
400K rows is nothing though.
– thebluephantom
Nov 12 '18 at 18:13
setting executer memory 8g also giving error
– Naresh
Nov 13 '18 at 11:08
|
show 2 more comments
I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.
Kind regards
EDIT
As @LiJianing suggested, you can increase the spark executor memory.
from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)
I'm sorry but it seems that your RAM isn't enough. Also, spark is intended to work on distributed systems with large amounts of data (clusters), so maybe it isn't the best option for what you are doing.
Kind regards
EDIT
As @LiJianing suggested, you can increase the spark executor memory.
from pyspark import SparkConf, SparkContext
conf = (SparkConf().set("spark.executor.memory", "8g"))
sc = SparkContext(conf = conf)
edited Nov 13 '18 at 9:26
answered Nov 12 '18 at 17:30
ManriqueManrique
522214
522214
can you guide me how i can move data from Postgres to spark environment, any tutorial.
– Naresh
Nov 12 '18 at 17:34
You can move the data fromPostgres
toHive
usingSqoop
and process it usingSpark
, right? You can, community.hortonworks.com/articles/14802/…
– karma4917
Nov 12 '18 at 17:50
Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…
– Manrique
Nov 12 '18 at 17:52
400K rows is nothing though.
– thebluephantom
Nov 12 '18 at 18:13
setting executer memory 8g also giving error
– Naresh
Nov 13 '18 at 11:08
|
show 2 more comments
can you guide me how i can move data from Postgres to spark environment, any tutorial.
– Naresh
Nov 12 '18 at 17:34
You can move the data fromPostgres
toHive
usingSqoop
and process it usingSpark
, right? You can, community.hortonworks.com/articles/14802/…
– karma4917
Nov 12 '18 at 17:50
Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…
– Manrique
Nov 12 '18 at 17:52
400K rows is nothing though.
– thebluephantom
Nov 12 '18 at 18:13
setting executer memory 8g also giving error
– Naresh
Nov 13 '18 at 11:08
can you guide me how i can move data from Postgres to spark environment, any tutorial.
– Naresh
Nov 12 '18 at 17:34
can you guide me how i can move data from Postgres to spark environment, any tutorial.
– Naresh
Nov 12 '18 at 17:34
You can move the data from
Postgres
to Hive
using Sqoop
and process it using Spark
, right? You can, community.hortonworks.com/articles/14802/…– karma4917
Nov 12 '18 at 17:50
You can move the data from
Postgres
to Hive
using Sqoop
and process it using Spark
, right? You can, community.hortonworks.com/articles/14802/…– karma4917
Nov 12 '18 at 17:50
Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…
– Manrique
Nov 12 '18 at 17:52
Assuming you are using scala, i've found something you might like. stackoverflow.com/questions/24916852/…
– Manrique
Nov 12 '18 at 17:52
400K rows is nothing though.
– thebluephantom
Nov 12 '18 at 18:13
400K rows is nothing though.
– thebluephantom
Nov 12 '18 at 18:13
setting executer memory 8g also giving error
– Naresh
Nov 13 '18 at 11:08
setting executer memory 8g also giving error
– Naresh
Nov 13 '18 at 11:08
|
show 2 more comments
Post your code, it will help us understand exactly what you are trying to do.
– Amar Gajbhiye
Nov 13 '18 at 5:57
@AmarGajbhiye i added code please have a look
– Naresh
Nov 13 '18 at 8:12