How to connect to redshift data using Spark on Amazon EMR cluster
How to connect to redshift data using Spark on Amazon EMR cluster
I have an Amazon EMR cluster running. If I do
ls -l /usr/share/aws/redshift/jdbc/
it gives me
RedshiftJDBC41-1.2.7.1003.jar
RedshiftJDBC42-1.2.7.1003.jar
Now, I want to use this jar
to connect to my Redshift database
in my spark-shell
. Here is what I do -
jar
Redshift database
spark-shell
import org.apache.spark.sql._
val sqlContext = new SQLContext(sc)
val df : DataFrame = sqlContext.read
.option("url","jdbc:redshift://host:PORT/DB-name?user=user&password=password")
.option("dbtable","tablename")
.load()
and I get this error -
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
I am not sure if I am specifying the correct format
while reading the data. I have also read that spark-redshift driver
is available but I do not want to run spark-submit
with extra JARS
.
format
spark-redshift driver
spark-submit
extra JARS
How do I connect to redshift data from Spark-shell ? Is that the correct JAR to configure the connection in Spark ?
1 Answer
1
The error being generated is because you are missing the .format("jdbc")
in your read. It should be:
.format("jdbc")
val df : DataFrame = sqlContext.read
.format("jdbc")
.option("url","jdbc:redshift://host:PORT/DB-name?user=user&password=password")
.option("dbtable","tablename")
.load()
By default, Spark assumes sources to be Parquet files, hence the mention of Parquet in the error.
You may still run into issues with classpath/finding the drivers, but this change should give you more useful error output. I assume that folder location you listed is in the classpath for Spark on EMR and those driver versions look to be fairly current. Those drivers should work.
Note, this will only work for reading from Redshift. If you need to write to Redshift your best bet is using the Databricks Redshift data source for Spark - https://github.com/databricks/spark-redshift.
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.