# Querying Datasets
# Flo.w Query Definitions
Flo.w dataset queries are defined using a JSON format based on SQL. The query can be very simple, for example, setting a limit on the number of results returned, or very complex, such as performing grouping and calculating aggregated statistics.
An example query is shown below with the equivalent SQL:
{
"where": { "label": { "$like": "BT%" } },
"limit": 10
}
SELECT *
FROM
"airquality"
WHERE
"label" LIKE 'BT%'
LIMIT 10;
// Make sure to add code blocks to your code group
The query definition may contain the following optional query terms:
{
"attributes": [], // Select attributes to return
"distinct": [], // Specify 'SELECT DISTINCT' or 'SELECT DISTINCT ON (...)'
"where": {}, // Specify filter clauses
"group": [], // Perform grouping of results
"order": [], // Sort results
"limit": 0, // Limit results
"offset": 0, // Offset results
"sample": {} // Specify sampling settings
}
# attributes
Specify attributes to return in the result set.
If ommitted, all attributes will be returned (the equivalent of select *
in SQL. Attributes can be aliased by using a tuple ([attr, alias]
) in place of the attribute name.
{
"attributes": [
"name",
[
"no2",
"no2_alias"
]
]
}
SELECT
"name", "no2" as "no2_alias"
FROM
"airquality";
// Make sure to add code blocks to your code group
# Attribute operators
The following operators can be used in attribute definitions:
Operator | PostgreSQL Equivalent | Examples | Notes |
---|---|---|---|
$col | "colName" | {$col: "colName"} | Escapes supplied argument for use as a column name |
$fn | funcName(arg1, arg2, ...) | {$fn: ["funcName", {$col: "a"}, 2]} | Call an arbitrary database function (see section below) |
$cast | cast(val as type) | {$cast: [{$col: "a"}, "integer"]} | Cast data types |
$mode | mode() within group (ORDER BY "colName"}) | {$mode: {$col: "colName"}} | Aggregate function to return 'mode' of a column (most frequent value) |
# Function Attributes
Calculated attributes that call arbitrary functions can also be defined using the $fn
operator. Function calls are specified using the following formats in place of an attribute name:
Simple format (array-style):
{"$fn": ["<funcName>", <arg1>, <arg2>, ...]}
Extended format (object-style):
{"$fn": {
name: "funcName",
args: [<arg1>, <arg2>, ...],
order?: <order clause>,
orderWithinGroup?: <order clause>,
filter?: <where clause>
}}
Calculated attributes will normally be combined with the aliasing feature to give your calculated attribute a name. If a literal column name is required anywhere in your query then you can use the format {"$col": <attr>}
. This operator will properly quote and escape the name you supply for inclusion in the generated SQL.
The extended format allows you to specify optional features used for Postgres SQL aggregate functions, namely:
ORDER BY
FILTER
ORDER WITHIN GROUP
Note that use of $fn
is an advanced feature and the API key that you use to access Flo.w must have this permission set.
{
"attributes": [
[
{ "$fn": [ "max", { "$col": "no2" } ] },
"maxVal"
]
]
}
SELECT
"max"("no2") as "maxVal"
FROM
"airquality";
// Make sure to add code blocks to your code group
// Calculate vehicle tracks by converting location points from a time-series
// to linestrings grouping by some vehicle ID.
{
"attributes": [
"id",
[
{ "$fn": {
"name": "st_makeline",
"args":[{ "$col": "geometry" } ],
"order": "timestamp"
}},
"linestring"
]
],
"group": "id"
}
SELECT
"id", "st_makeline"("geometry" ORDER BY "timestamp") AS "linestring"
FROM
"table"
GROUP BY "id";
// Make sure to add code blocks to your code group
# distinct
Specify a distinct
clause to remove duplicate rows from the result set.
Use an empty array for a simple SELECT DISTINCT
query:
{
"distinct": [],
...
}
SELECT DISTINCT
*
FROM
"airquality";
// Make sure to add code blocks to your code group
Specify a single attribute or an array of attributes for a SELECT DISTINCT ON (...)
query:
{
"distinct": 'name', // or ['name', ...],
...
}
SELECT DISTINCT ON ("name")
*
FROM
"airquality";
// Make sure to add code blocks to your code group
# where
Specify filtering clauses.
For simple 'equality' filtering, specify attribute/value pairs within the where
clause of the query. Multiple conditions are combined using AND logic:
{
"where": {
"name": "BG1",
"no2": 100
}
}
SELECT *
FROM
"airquality"
WHERE
"name" = 'BG1' AND "no2" = 100;
// Make sure to add code blocks to your code group
# Operators
In addition to simple equality filtering, the following operators are also available:
Operator | PostgreSQL Equivalent | Examples | Notes |
---|---|---|---|
$or | (clause1 OR clause2 ...) | {$or: [clause1, clause2, ...]} | Combines sub-clauses |
$and | (clause1 AND clause2 ...) | {$and: [clause1, clause2, ...]} | Combines sub-clauses |
$gt | > | {$gt: 1} | |
$gte | >= | {$gte: 1} | |
$lt | < | {$lt: 1} | |
$lte | <= | {$lte: 1} | |
$ne | != | {$ne: 'a'} | |
$eq | = | {$eq: 'a'} | Equivalent to a simple attribute/value pair |
$is | IS | {$is: null} | |
$not | IS NOT | {$not: null} | |
$like | LIKE | {$like: 'abc%} | |
$notLike | NOT LIKE | {$notLike: 'abc%} | |
$iLike | ILIKE | {$iLike: 'abc%} | Case-insensitive |
$notILike | NOT ILIKE | {$notILike: 'abc%} | Case-insensitive |
$regexp | ~ | {$regexp: '^a.*} | |
$notRegexp | !~ | {$notRegexp: '^a.*} | |
$iRegexp | ~* | {$iRegexp: '^a.*} | Case-insensitive |
$notIRegexp | !~* | {$notIRegexp: '^a.*} | Case-insensitive |
$between | BETWEEN a AND b | {$between: [a, b]} | |
$notBetween | NOT BETWEEN a AND b | {$notBetween: [a, b]} | |
$in | IN (a, b) | {$in: [a, b]} |
{
"where": {
"no2": {
"$gt": 10
},
"label": {
"$like": "B%"
}
}
}
SELECT *
FROM
"airquality"
WHERE
"no2" > 10 AND "label" LIKE 'B%';
// Make sure to add code blocks to your code group
# Spatial Operators
The following spatial operators are available:
Operator | Description | Examples |
---|---|---|
$point | Construct a point. Supply an array of 2 or 3 elements [x, y, <srid>] . SRID defaults to 4326 (WGS84) if not specified. | {$point: [-1.0, 51.0, 4326]} |
$rect | Construct a bounding rectangle. Supply an array of 4 or 5 elements [xMin, yMin, xMax, yMax, <srid>] . SRID defaults to 4326 (WGS84) if not specified. | {$rect: [-1.0, 51.0, 1,0, 52, 4326]} |
$box | Construct a 3D bounding box. Supply an array of 6 or 7 elements [xMin, yMin, zMin, yMax, xMax, yMax, zMax, <srid>] . SRID defaults to 4326 (WGS84) if not specified. | {$box: [-1.0, 51.0, 0, 1.0, 52, 1000, 4326]} |
$intersects | True if target intersects the supplied geometry | {geom: {$intersects: {$rect: {...}}}} |
$overlaps | True if target bounding box overlaps the bounding box of the supplied geometry | {geom: {$overlaps: {$rect: {...}}}} |
{
"where": {
"geometry": {
"$overlaps": {"$rect": [-1, 51, 1, 52]}
}
}
}
SELECT *
FROM
"airquality"
WHERE
"geometry" && ST_MakeEnvelope(-1, 51, 1, 52, 4326);
// Make sure to add code blocks to your code group
# Range Operators
The following range operators are available:
Operator | Description | Examples |
---|---|---|
$overlaps | True if target range overlaps the supplied literal range | {r: {$overlaps: "[1, 10]"}} |
$contains | True if target range contains the supplied value | {r: {$contains: 5}} |
Note that range literals follow the convention used by PostgreSQL. Use '[' and ']' for inclusive range bounds. Use '(' and ')' for exclusive range bounds.
{
"where": {
"r": {
"$overlaps": "[1, 10)"
}
}
}
SELECT *
FROM
"test_table"
WHERE
"r" && '[1, 10)';
// Make sure to add code blocks to your code group
{
"where": {
"r": {
"$contains": 5
}
}
}
SELECT *
FROM
"test_table"
WHERE
"r" &> 5;
// Make sure to add code blocks to your code group
# Using boolean functions as where clauses
Any PostgreSQL function (built-in or user-defined) that returns a boolean value can be used as a where clause. Use an arbitrary (but descriptive) key to introduce the function:
{
"where": {
"a": {$gt: 5},
"custom_spatial_clause": {
"$fn": ["st_intersects", {$col: "geometry"}, {$rect: [-1, 51, 1, 52]}]
}
}
}
SELECT *
FROM
"test_table"
WHERE
"a" > 5 AND "st_intersects"("geometry", ST_MakeEnvelope(-1, 51, 1, 52, 4326)};
// Make sure to add code blocks to your code group
# Combining clauses
For complex logic, sub-clauses can be combined using $and and $or. Sub-clauses can be nested to arbitrary depth:
{
"where": {
"$and": [
{
"$or": [ { "label": "AB1" }, { "label": { "$like": "B%" } } ]
},
{ "no2": { "$gt": 10 } }
]
}
}
SELECT *
FROM
"airquality"
WHERE
(("label" = 'AB1' OR "label" LIKE 'B%') AND "no2" > 10);
// Make sure to add code blocks to your code group
# group
Specify result grouping (equivalent to 'group by' in SQL).
As with SQL, attributes returned from a 'group' query must be mentioned in the group property or be aggregate functions.
The group property takes an array of attribute names to group by. As with PostgreSQL, you may also use positional attributes to avoid repeating a complex calculated attribute.
The query below groups all results from the 'airquality' dataset based on the initial letter of the name field. The letter groups and record counts are returned ordered alphabetically:
{
"attributes": [
[ { "$fn": [ "left", { "$col": "name" }, 1 ] }, "First Letter" ],
[ { "$fn": [ "count", "*" ] }, "Count" ]
],
"group": [ 1 ]
}
SELECT "left"("name", 1) as "First Letter", "count"('*') as "Count"
FROM
"airquality"
GROUP BY 1;
// Make sure to add code blocks to your code group
# order
Specify sorting options for the returned results.
order
may be a single attribute, or an array of attributes to sort by multiple fields. You can also specify the direction by using a tuple ([attr, 'asc|desc']
) in place of the attribute name.
{
"order": "no2"
}
SELECT *
FROM
"airquality
ORDER BY "no2";
// Make sure to add code blocks to your code group
{
"order": [ [ "no2", "desc" ], "name" ]
}
SELECT *
FROM
"airquality"
ORDER BY "no2" DESC, "name";
// Make sure to add code blocks to your code group
# limit
Specify limit on the number of results returned.
If you do not explicitly specify a limit then a default limit will be applied to your query. This default limit is configured on a per-application basis. A hard maximum limit, which cannot be exceeded, is also configured per-application.
{
"limit": 10
}
SELECT *
FROM
"airquality"
LIMIT 10;
// Make sure to add code blocks to your code group
# offset
Specify an offset for the returned results.
offset
is typically combined with limit
to retrieve pages of results.
{
"limit": 10,
"offset": 10
}
SELECT *
FROM
"airquality"
LIMIT 10 OFFSET 10;
// Make sure to add code blocks to your code group
# sample
Specify sampling settings to return a random subset of results. See PostgreSQL TABLESAMPLE (opens new window) for details.
Property | Description |
---|---|
percentage * | The percentage of results to return. |
strategy | The strategy to use: system (default) or bernoulli . |
seed | A random seed value. |
* Required
{
...,
"sample": {"percentage": 10}
}
SELECT * FROM "table" TABLESAMPLE SYSTEM(10);
// Make sure to add code blocks to your code group
# Issuing a Dataset Query
Dataset queries can be issued using one of the methods depending on the context of the request:
- A direct REST API request to the Dataset Query endpoint.
- Using the Flo.w CLI tool.
- Using the Flo.w engine JavaScript client library.
- A Flo.w RDF reactive dataset query.
Note that all methods ultimately result in a REST API request to Flo.w Engine. A request requires the ID of the dataset to query and a query definition as described in the previous section.
In the example below, the curl
command is used to issue a query via the REST API:
curl --request POST \
--url https://flow.emu-analytics.net/api/v1/datasets/airquality/query \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--header 'x-flow-api-key: API_KEY' \
--data '{"limit":10}'
The same query can be issued using the Flo.w CLI tool:
flow datasets query airquality --limit 10
# Query Timeouts
Flo.w dataset queries will time out if they overrun a configured maximum query time. Configuration of the time-out value (in milliseconds) can be performed at the application, API key, or dataset level by setting the datasetAccessOptions.statementTimeout
metadata key on the appropriate resource. The timeout value can also be specified for an individual query by specifying a timeout
query parameter with the REST request. The effective value for the query time-out is determined in increasing order of precedence as follows:
- Flo.w application
datasetAccessOptions.statementTimeout
metadata setting. - API key
datasetAccessOptions.statementTimeout
metadata setting. - Dataset
datasetAccessOptions.statementTimeout
metadata setting. - REST request
timeout
query parameter.
Because query results are streamed to the requesting client as a chunked HTTP response, a specific time-out error cannot be reported but the connection will be dropped. Clients should interpret a dropped network connection as a query time-out for the purpose of error reporting.
# Query Result Formats
Query results can be returned in the following formats:
json
: an array of data objects (default).geojson
: a GeoJSON FeatureCollection (opens new window).csv
: CSV-formatted text.
To specify the output format, supply the format
query parameter or use Flo.w CLI --format
option:
# JSON
flow datasets query airquality --limit 1 --format json
[
{
"no2": 60,
"timestamp": "2006-12-31T22:59:59.000Z",
"name": "Kingston - Kingston Bypass A3",
"label": "A30",
"geometry": {
"type": "MultiPoint",
"coordinates": [
[
-0.292,
51.3736
]
]
}
}
]
# GeoJSON
flow datasets query airquality --limit 1 --format geojson
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "MultiPoint",
"coordinates": [
[
-0.292,
51.3736
]
]
},
"properties": {
"no2": 60,
"timestamp": "2006-12-31T22:59:59.000Z",
"name": "Kingston - Kingston Bypass A3",
"label": "A30"
}
}
]
}
Export as GeoJSON
Use shell redirection to export a Dataset to a GeoJSON file:
flow datasets query airquality --format geojson > output.json
# CSV
flow datasets query airquality --limit 1 --format csv
no2,timestamp,name,label
60,1167605999000,Kingston - Kingston Bypass A3,A30
Geometry
Note that geometry-type attributes are not included in CSV output.
# Simplified Query Syntax
Simplified dataset queries can be performed by issuing a GET request to the Simple Dataset Query REST endpoint.
Query options are passed as part of the request query string:
Query Option | Description | Example | Notes |
---|---|---|---|
attributes | Attributes to return | ?attributes=id&attributes=name | Prefix attribute name with ! to exclude |
filter | Apply filter conditions | ?filter[name]=~^Ba&filter[no2]=>40 | See Operator table below |
limit | Limit results | ?limit=10 | |
offset | Offset results | ?offset=10 | |
format | Result format | ?format=geojson | See Query Result Formats |
explain | Explain query | ?explain=true | Return query explanation |
Filter clauses may use the following operators:
Filter Operator | Description | Example |
---|---|---|
Equal | ?filter[attribute]=val | |
! | Not equal | ?filter[attribute]=!val |
> | Greater than | ?filter[attribute]=>10 |
>= | Greater than or equal | ?filter[attribute]=>=10 |
< | Less than | ?filter[attribute]=<10 |
<= | Less than or equal | ?filter[attribute]=<=10 |
~ | Regular Expression | ?filter[attribute]=~^Ba |
# Querying Parameterized Datasets
Parameterized Flo.w Datasets are defined using a SQL select
statement including named parameter placeholders that are expanded when a query is performed. Parameterized Datasets move complex query
expressions into SQL and simplify issuing queries against the Dataset.
To supply parameters when querying a parameterized Flo.w Dataset:
- Add query string name/value pairs when issuing a direct REST API query.
- Use the
--params
option when using the Flo.w CLI tool. - Add parameters when using the Flo.w Client Library.
- Add parameters to a Flo.w RDF reactive query.
The example parameterized Dataset airquality_with_params
has the following SQL source definition:
SELECT * from airquality WHERE no2 > {{no2_filter}}::double precision order by no2 desc
The no2_filter
parameter is supplied when a query is issued:
curl -X GET \
'https://flow.emu-analytics.net/api/v1/datasets/airquality_with_params/query?no2_filter=50&limit=1' \
-H 'Content-Type: application/json' \
-H 'x-api-key: b6b77cc4-904a-46d1-aa0f-3bf3848ce4c7'
TIP
Parameterized Datasets can be combined with additional query options such as attributes
, limit
and where
to further refine the query.
Both the advanced 'POST' and simplified 'GET' query methods can be used.
SQL Injection
Parameters are always passed as strings and are escaped to prevent SQL injection.
If required, use casting within the SQL expression to cast the parameter to the desired type.
# Explaining Dataset Queries
To view the full generated SQL query and query plan when issuing a Dataset query add explain=true
as a
query parameter to the query request. Query explanation can also be enabled by specifying the --explain
option when using the Flo.w CLI tool.
The response is an array of strings with the first element containing the generated SQL. The remaining elements contain the query plan as lines of text. For example:
flow ds query airquality --limit 10 --explain -f json
Results:
[
"explain SELECT * FROM \"airquality\" LIMIT 10;",
"Limit (cost=0.00..0.30 rows=10 width=87)",
" -> Seq Scan on airquality (cost=0.00..6.05 rows=205 width=87)"
]