JSON support is the most interesting new Postgres feature of the last few years. It relaxes the primary constraint of SQL databases — the rigid schema structure — by letting you store semi-structured data in your tables alongside other data. It also decisively counters the NoSQL trend by giving users document store-like semantics in a proven, mature database technology.
But while the getters and setters are straightforward, the functions for analysis queries are a bit counterintuitive. In this post, we’ll cover a few common patterns and how to build them out.
Joining on array values
Often, when adding many-to-one data to a table, you’ll start with a simple array. Yet inevitably an unanticipated analysis need will come up, and you’ll need to write joins using the data in the arrays.
For example, if you stored each user’s device UUIDs in a JSON array, but now you want to join it to our newly-created devices table. The users table might look like this:
Start by breaking the table out into one row per users.id per device_uuid. Do that with the handy json_array_elements function:
json_array_elements is a set-defining function, meaning it returns more than one row for each row it’s called on. In this case you’re going to get one row per user per device_uuid. Here are the results:
Now there is a device_uuid column. You can finally join users to devices. For example, if you wanted number of users per device platform, do this:
Just subselect your previous query and write a quick group-and-count.
Aggregating over arrays
Similarly, you might quickly store individual values in an array, but then need to aggregate those values later. For example, store per-user website load times in an array in the users table like so:
By storing them this way, you’ve made getting per-user average load times more annoying than if you had used a separate pageload_logs table. But, as before, you can create that table on the fly with the json_array_elements function:
If you’re logging events from mobile devices and other clients, you may be using a JSON column for event data, with a key for each event type and a value for the count. For example, each timestamp might have an object like this:
Now sum the total number of events you got on each day. Use the set-defining function json_each_text to split out the keys and values. Start by just selecting the raw keys and values:
select * from events, json_each_text(events.event_data)
from events cross join lateral json_each_text(events.event_data)
The second half of the join references the table in the first half of the join, restricting the keys and values to the right event.
This gives the results you expect:
Given that you have the keys and values right there in the join, it is possible to rewrite the query with a simple group-and-count:
select date(event_time), sum(value::float)
from events, json_each_text(events.event_data)
group by 1
Note the cast of value to float. Choose json_each_text instead of json_each because it returns text, which can be cast directly to float.
Now you’ve got the total events per day.
Favorite trick: split the JSON data out into rows and then aggregate. The best functions for doing this are json_array_elements for arrays and json_each_text for objects. Paired with a little creative join logic, you’ll be analyzing JSON data in no time.