Parse Json Object with an Array and Map to Multiple Pairs with Apache Spark in Java

Parse Json Object with an Array and Map to Multiple Pairs with Apache Spark in Java



I've googled it all day long and couldn't find straight answer, so ended up posting a question here.



I have a file containing line-delimited json objects:


"device_id": "103b", "timestamp": 1436941050, "rooms": ["Office", "Foyer"]
"device_id": "103b", "timestamp": 1435677490, "rooms": ["Office", "Lab"]
"device_id": "103b", "timestamp": 1436673850, "rooms": ["Office", "Foyer"]



My goal is to parse this file with Apache Spark in Java. I referenced How to Parsing CSV or JSON File with Apache Spark and so far I could successfully parse each line of json to JavaRDD using Gson.


JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> data = sc.textFile("fileName");
JavaRDD<JsonObject> records = data.map(new Function<String, JsonObject>()
public JsonObject call(String line) throws Exception
Gson gson = new Gson();
JsonObject json = gson.fromJson(line, JsonObject.class);
return json;

);



Where I'm really stuck is I want to deserialize the "rooms" array so that it can fit to my class Event.


public class Event implements Serializable
public static final long serialVersionUID = 42L;
private String deviceId;
private int timestamp;
private String room;
// constructor , getters and setters



In other words, from this line:


"device_id": "103b", "timestamp": 1436941050, "rooms": ["Office", "Foyer"]



I want to create two Event objects in Spark:


obj1: deviceId = "103b", timestamp = 1436941050, room = "Office"
obj2: deviceId = "103b", timestamp = 1436941050, room = "Foyer"



I did my little search and tried flatMapVlue, but no luck... It threw me an error...


JavaRDD<Event> events = records.flatMapValue(new Function<JsonObject, Iterable<Event>>()
public Iterable<Event> call(JsonObject json) throws Exception
JsonArray rooms = json.get("rooms").getAsJsonArray();
List<Event> data = new LinkedList<Event>();
for (JsonElement room : rooms)
data.add(new Event(json.get("device_id").getAsString(), json.get("timestamp").getAsInt(), room.toString()));

return data;

);



I'm very new to Spark and Map/Reduce. I would be grateful if you can help me out. Thanks in advance!





Please, post your error. Edit your post and add the stacktrace
– Vladimir Vagaytsev
Jul 13 '16 at 7:53




2 Answers
2



If you load json data into a DataFrame:


DataFrame


DataFrame df = sqlContext.read().json("/path/to/json");



You could easily do this by explode.


explode


df.select(
df.col("device_id"),
df.col("timestamp"),
org.apache.spark.sql.functions.explode(df.col("rooms")).as("room")
);



For input:


"device_id": "1", "timestamp": 1436941050, "rooms": ["Office", "Foyer"]
"device_id": "2", "timestamp": 1435677490, "rooms": ["Office", "Lab"]
"device_id": "3", "timestamp": 1436673850, "rooms": ["Office", "Foyer"]



You will get:


+---------+------+----------+
|device_id| room| timestamp|
+---------+------+----------+
| 1|Office|1436941050|
| 1| Foyer|1436941050|
| 2|Office|1435677490|
| 2| Lab|1435677490|
| 3|Office|1436673850|
| 3| Foyer|1436673850|
+---------+------+----------+





Thanks for letting me know this useful feature. I didn't know Spark supports Hive like UDF. It's very helpful!
– gyoho
Jul 13 '16 at 16:57





spark is full compatible with hive (*≧▽≦)
– Yuan JI
Jul 13 '16 at 17:00


val formatrecord = records.map(fromJson[mapClass](_))



mapClass should be a case class for mapping the object inside the records json.



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Crossroads (UK TV series)

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế