How to connect to redshift data using Spark on Amazon EMR cluster

I have an Amazon EMR cluster running. If I do

ls -l /usr/share/aws/redshift/jdbc/

it gives me

RedshiftJDBC41-1.2.7.1003.jar RedshiftJDBC42-1.2.7.1003.jar

Now, I want to use this jar to connect to my Redshift database in my spark-shell . Here is what I do -

jar

Redshift database

spark-shell

import org.apache.spark.sql._ val sqlContext = new SQLContext(sc) val df : DataFrame = sqlContext.read .option("url","jdbc:redshift://host:PORT/DB-name?user=user&password=password") .option("dbtable","tablename") .load()

and I get this error -

org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;

I am not sure if I am specifying the correct format while reading the data. I have also read that spark-redshift driver is available but I do not want to run spark-submit with extra JARS.

format

spark-redshift driver

spark-submit

extra JARS

How do I connect to redshift data from Spark-shell ? Is that the correct JAR to configure the connection in Spark ?

1 Answer
1

The error being generated is because you are missing the .format("jdbc") in your read. It should be:

.format("jdbc")

val df : DataFrame = sqlContext.read .format("jdbc") .option("url","jdbc:redshift://host:PORT/DB-name?user=user&password=password") .option("dbtable","tablename") .load()

By default, Spark assumes sources to be Parquet files, hence the mention of Parquet in the error.

You may still run into issues with classpath/finding the drivers, but this change should give you more useful error output. I assume that folder location you listed is in the classpath for Spark on EMR and those driver versions look to be fairly current. Those drivers should work.

Note, this will only work for reading from Redshift. If you need to write to Redshift your best bet is using the Databricks Redshift data source for Spark - https://github.com/databricks/spark-redshift.

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt