Reading a Specific Key (name of the car “name”) of JSON… (cars -> name).Now, let’s move on to understanding such a functionality step by step. Once we have an SQLContext Object ( sqlcontext) ready, we can start reading the JSON.īased on business needs, Spark Data Frame (sparkjsondf) features/functions can be used to perform operations on JSON Data, such as knowing its schema/structure, displaying its data or extracting the data of specific key(s) or section(s) or renaming Keys or exploding Arrays to complete the JSON into a structured table.Before processing JSON, need to execute the following required steps to create an SQLContext Object.Spark makes processing of JSON easy via SparkSQL API using SQLContext object ( .SQLContext) and converts it into Spark Data Frame and executes SQL Analytical Queries on top of it. In this blog, I will be covering the processing of JSON from HDFS only. In Spark, JSON can be processed from different Data Storage layers like Local, HDFS, S3, RDBMS or NoSQL. In this blog, all the above JSONs will be referred to as “ Raw JSONs” (dealer, employee & car_servicing_details). The above JSON contains multiple ‘cars dealer’ JSON Objects and each dealer object contains a nesting array of “cars” & the cars array contains another nesting array of “ models”.Īnother Example of Complex JSON (e.g. JSON can be called complex if it contains nested elements (e.g. Object Keys are: employee_id, employee _name, email & car_model. The above JSON is an Array of multiple employee JSON objects. Following is an example of a simple JSON which has three JSON objects. JSON Object consists of two primary elements, keys and values. A JSON (.json file) can contains multiple JSON objects surrounded by curly braces. JSON stands for JavaScript Object Notation. In this blog, we will understand the mechanism used to process JSON and how its data can be utilized for Data Analysis by executing analytical SQL Queries with the following Cloud Service/Frameworks,īefore going into processing mechanism, let’s shallow dive into JSON. JSON is one type of semi-structured data & it can be generated from many sources like smart devices or Rest API calls or in response to an event or request. Semi-structured data is data with nested data structures and the lack of a fixed schema & contains semantic tags or other types of mark-ups that identify individual and distinct entities within the data. Let’s understand what we mean when we use the term ‘ Semi-structured data’.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |