Combine JSON objects contained in a single file into one JSON file using Pyspark
Combine JSON objects contained in a single file into one JSON file using Pyspark
I have a huge file of around 20 GB which contains JSON objects as shown below:
"-LGsfNZANyy3sWBuNn5s":"callsign":"6aay","deviceId":"97436FB5-B4DE-8D8E-0000-000000000000"
"-LIIe7e7tz1BzSzhG1AK":"callsign":"3fox","deviceId":"9A554634-2373-DFDF-0000-000000000000","dow":"Tuesday"
I need to create a single JSON object which contains all these objects inside it as follows:
"-LGsfNZANyy3sWBuNn5s":"callsign":"6aay","deviceId":"97436FB5-B4DE-8D8E-0000-000000000000",
"-LIIe7e7tz1BzSzhG1AK":"callsign":"3fox","deviceId":"9A554634-2373-DFDF-0000-000000000000","dow":"Tuesday"
Can this be done using Pyspark? Any help/suggestion is greatly appreciated.
I have a nested JSON objects as well in a single JSON object which is why I'm looking for a concrete solution to merge each line of JSON object into a single JSON file.
– Sains
Sep 17 '18 at 2:41
That solution will work with nested JSON objects, as long as there is one full JSON object (in the format you showed) per line.
– njha
Sep 17 '18 at 2:44
Ok. I'll try this suggestion. Thanks
– Sains
Sep 17 '18 at 4:43
0
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy
Why not just delete the closing } of every line except the last, and the opening { of every line except the first, and then append ,s on every line except the last? If your data is formatted exactly like that, it should work :P This obviously isn't good practice, but it's probably the easiest solution.
– njha
Sep 17 '18 at 2:25