# Querying Datasets

Flo.w Query Definitions
Issuing a Dataset Query
Query Result Formats
Simplified Query Syntax
Querying Parameterized Datasets
Explaining Dataset Queries

# Flo.w Query Definitions

Flo.w dataset queries are defined using a JSON format based on SQL. The query can be very simple, for example, setting a limit on the number of results returned, or very complex, such as performing grouping and calculating aggregated statistics.

An example query is shown below with the equivalent SQL:

{
  "where": { "label": { "$like": "BT%" } },
  "limit": 10
}

SELECT *
FROM
 "airquality"
WHERE
 "label" LIKE 'BT%'
LIMIT 10;

// Make sure to add code blocks to your code group

The query definition may contain the following optional query terms:

{
  "attributes": [], // Select attributes to return
  "distinct": [],   // Specify 'SELECT DISTINCT' or 'SELECT DISTINCT ON (...)'
  "where": {},      // Specify filter clauses
  "group": [],      // Perform grouping of results
  "order": [],      // Sort results
  "limit": 0,       // Limit results
  "offset": 0,      // Offset results
  "sample": {}      // Specify sampling settings
}

# `attributes`

Specify attributes to return in the result set.

If ommitted, all attributes will be returned (the equivalent of select * in SQL. Attributes can be aliased by using a tuple ([attr, alias]) in place of the attribute name.

{
  "attributes": [
    "name",
    [
      "no2",
      "no2_alias"
    ]
  ]
}

SELECT
  "name", "no2" as "no2_alias"
FROM
  "airquality";

// Make sure to add code blocks to your code group

# Attribute operators

The following operators can be used in attribute definitions:

Operator	PostgreSQL Equivalent	Examples	Notes
$col	"colName"	`{$col: "colName"}`	Escapes supplied argument for use as a column name
$fn	funcName(arg1, arg2, ...)	`{$fn: ["funcName", {$col: "a"}, 2]}`	Call an arbitrary database function (see section below)
$cast	cast(val as type)	`{$cast: [{$col: "a"}, "integer"]}`	Cast data types
$mode	mode() within group (ORDER BY "colName"})	`{$mode: {$col: "colName"}}`	Aggregate function to return 'mode' of a column (most frequent value)

# Function Attributes

Calculated attributes that call arbitrary functions can also be defined using the $fn operator. Function calls are specified using the following formats in place of an attribute name:

Simple format (array-style):

{"$fn": ["<funcName>", <arg1>, <arg2>, ...]}

Extended format (object-style):

{"$fn": {
  name: "funcName",
  args: [<arg1>, <arg2>, ...],
  order?: <order clause>,
  orderWithinGroup?: <order clause>,
  filter?: <where clause>
}}

Calculated attributes will normally be combined with the aliasing feature to give your calculated attribute a name. If a literal column name is required anywhere in your query then you can use the format {"$col": <attr>}. This operator will properly quote and escape the name you supply for inclusion in the generated SQL.

The extended format allows you to specify optional features used for Postgres SQL aggregate functions, namely:

ORDER BY
FILTER
ORDER WITHIN GROUP

Note that use of $fn is an advanced feature and the API key that you use to access Flo.w must have this permission set.

{
  "attributes": [
    [
      { "$fn": [ "max", { "$col": "no2" } ] },
      "maxVal"
    ]
  ]
}

SELECT
  "max"("no2") as "maxVal"
FROM
  "airquality";

// Make sure to add code blocks to your code group

// Calculate vehicle tracks by converting location points from a time-series
// to linestrings grouping by some vehicle ID.
{
  "attributes": [
    "id",
    [
      { "$fn": {
         "name": "st_makeline",
         "args":[{ "$col": "geometry" } ],
         "order": "timestamp"
      }},
      "linestring"
    ]
  ],
  "group": "id"
}

SELECT
  "id", "st_makeline"("geometry" ORDER BY "timestamp") AS "linestring"
FROM
  "table"
GROUP BY "id";

// Make sure to add code blocks to your code group

# `distinct`

Specify a distinct clause to remove duplicate rows from the result set.

Use an empty array for a simple SELECT DISTINCT query:

{
  "distinct": [],
  ...
}

SELECT DISTINCT
  *
FROM
  "airquality";

// Make sure to add code blocks to your code group

Specify a single attribute or an array of attributes for a SELECT DISTINCT ON (...) query:

{
  "distinct": 'name', // or ['name', ...],
  ...
}

SELECT DISTINCT ON ("name")
  *
FROM
  "airquality";

// Make sure to add code blocks to your code group

# `where`

Specify filtering clauses.

For simple 'equality' filtering, specify attribute/value pairs within the where clause of the query. Multiple conditions are combined using AND logic:

{
  "where": {
    "name": "BG1",
    "no2": 100
  }
}

SELECT *
FROM
  "airquality"
WHERE
  "name" = 'BG1' AND "no2" = 100;

// Make sure to add code blocks to your code group

# Operators

In addition to simple equality filtering, the following operators are also available:

Operator	PostgreSQL Equivalent	Examples	Notes
$or	(clause1 OR clause2 ...)	`{$or: [clause1, clause2, ...]}`	Combines sub-clauses
$and	(clause1 AND clause2 ...)	`{$and: [clause1, clause2, ...]}`	Combines sub-clauses
$gt	>	`{$gt: 1}`
$gte	>=	`{$gte: 1}`
$lt	<	`{$lt: 1}`
$lte	<=	`{$lte: 1}`
$ne	!=	`{$ne: 'a'}`
$eq	=	`{$eq: 'a'}`	Equivalent to a simple attribute/value pair
$is	IS	`{$is: null}`
$not	IS NOT	`{$not: null}`
$like	LIKE	`{$like: 'abc%}`
$notLike	NOT LIKE	`{$notLike: 'abc%}`
$iLike	ILIKE	`{$iLike: 'abc%}`	Case-insensitive
$notILike	NOT ILIKE	`{$notILike: 'abc%}`	Case-insensitive
$regexp	~	`{$regexp: '^a.*}`
$notRegexp	!~	`{$notRegexp: '^a.*}`
$iRegexp	~*	`{$iRegexp: '^a.*}`	Case-insensitive
$notIRegexp	!~*	`{$notIRegexp: '^a.*}`	Case-insensitive
$between	BETWEEN a AND b	`{$between: [a, b]}`
$notBetween	NOT BETWEEN a AND b	`{$notBetween: [a, b]}`
$in	IN (a, b)	`{$in: [a, b]}`

{
  "where": {
    "no2": {
      "$gt": 10
    },
    "label": {
      "$like": "B%"
    }
  }
}

SELECT *
FROM
  "airquality"
WHERE
  "no2" > 10 AND "label" LIKE 'B%';

// Make sure to add code blocks to your code group

# Spatial Operators

The following spatial operators are available:

Operator	Description	Examples
$point	Construct a point. Supply an array of 2 or 3 elements `[x, y, <srid>]`. SRID defaults to 4326 (WGS84) if not specified.	`{$point: [-1.0, 51.0, 4326]}`
$rect	Construct a bounding rectangle. Supply an array of 4 or 5 elements `[xMin, yMin, xMax, yMax, <srid>]`. SRID defaults to 4326 (WGS84) if not specified.	`{$rect: [-1.0, 51.0, 1,0, 52, 4326]}`
$box	Construct a 3D bounding box. Supply an array of 6 or 7 elements `[xMin, yMin, zMin, yMax, xMax, yMax, zMax, <srid>]`. SRID defaults to 4326 (WGS84) if not specified.	`{$box: [-1.0, 51.0, 0, 1.0, 52, 1000, 4326]}`
$intersects	True if target intersects the supplied geometry	`{geom: {$intersects: {$rect: {...}}}}`
$overlaps	True if target bounding box overlaps the bounding box of the supplied geometry	`{geom: {$overlaps: {$rect: {...}}}}`

{
  "where": {
    "geometry": {
      "$overlaps": {"$rect": [-1, 51, 1, 52]}
    }
  }
}

SELECT *
FROM
  "airquality"
WHERE
  "geometry" && ST_MakeEnvelope(-1, 51, 1, 52, 4326);

// Make sure to add code blocks to your code group

# Range Operators

The following range operators are available:

Operator	Description	Examples
$overlaps	True if target range overlaps the supplied literal range	`{r: {$overlaps: "[1, 10]"}}`
$contains	True if target range contains the supplied value	`{r: {$contains: 5}}`

Note that range literals follow the convention used by PostgreSQL. Use '[' and ']' for inclusive range bounds. Use '(' and ')' for exclusive range bounds.

{
  "where": {
    "r": {
      "$overlaps": "[1, 10)"
    }
  }
}

SELECT *
FROM
  "test_table"
WHERE
  "r" && '[1, 10)';

// Make sure to add code blocks to your code group

{
  "where": {
    "r": {
      "$contains": 5
    }
  }
}

SELECT *
FROM
  "test_table"
WHERE
  "r" &> 5;

// Make sure to add code blocks to your code group

# Using boolean functions as where clauses

Any PostgreSQL function (built-in or user-defined) that returns a boolean value can be used as a where clause. Use an arbitrary (but descriptive) key to introduce the function:

{
  "where": {
    "a": {$gt: 5},
    "custom_spatial_clause": {
      "$fn": ["st_intersects", {$col: "geometry"}, {$rect: [-1, 51, 1, 52]}]
    }
  }
}

SELECT *
FROM
  "test_table"
WHERE
  "a" > 5 AND "st_intersects"("geometry", ST_MakeEnvelope(-1, 51, 1, 52, 4326)};

// Make sure to add code blocks to your code group

# Combining clauses

For complex logic, sub-clauses can be combined using $and and $or. Sub-clauses can be nested to arbitrary depth:

{
  "where": {
    "$and": [
      {
        "$or": [ { "label": "AB1" }, { "label": { "$like": "B%" } } ]
      },
      { "no2": { "$gt": 10 } }
    ]
  }
}

SELECT *
FROM
 "airquality"
WHERE
 (("label" = 'AB1' OR "label" LIKE 'B%') AND "no2" > 10);

// Make sure to add code blocks to your code group

# `group`

Specify result grouping (equivalent to 'group by' in SQL).

As with SQL, attributes returned from a 'group' query must be mentioned in the group property or be aggregate functions.

The group property takes an array of attribute names to group by. As with PostgreSQL, you may also use positional attributes to avoid repeating a complex calculated attribute.

The query below groups all results from the 'airquality' dataset based on the initial letter of the name field. The letter groups and record counts are returned ordered alphabetically:

{
  "attributes": [
    [ { "$fn": [ "left", { "$col": "name" }, 1 ] }, "First Letter" ],
    [ { "$fn": [ "count", "*" ] }, "Count" ]
  ],
  "group": [ 1 ]
}

SELECT "left"("name", 1) as "First Letter", "count"('*') as "Count"
FROM
"airquality"
GROUP BY 1;

// Make sure to add code blocks to your code group

# `order`

Specify sorting options for the returned results.

order may be a single attribute, or an array of attributes to sort by multiple fields. You can also specify the direction by using a tuple ([attr, 'asc|desc']) in place of the attribute name.

{
  "order": "no2"
}

SELECT *
FROM
 "airquality
ORDER BY "no2";

// Make sure to add code blocks to your code group

{
  "order": [ [ "no2", "desc" ], "name" ]
}

SELECT *
FROM
 "airquality"
ORDER BY "no2" DESC, "name";

// Make sure to add code blocks to your code group

# `limit`

Specify limit on the number of results returned.

If you do not explicitly specify a limit then a default limit will be applied to your query. This default limit is configured on a per-application basis. A hard maximum limit, which cannot be exceeded, is also configured per-application.

{
  "limit": 10
}

SELECT *
FROM
 "airquality"
LIMIT 10;

// Make sure to add code blocks to your code group

# `offset`

Specify an offset for the returned results.

offset is typically combined with limit to retrieve pages of results.

{
  "limit": 10,
  "offset": 10
}

SELECT *
FROM
 "airquality"
LIMIT 10 OFFSET 10;

// Make sure to add code blocks to your code group

# `sample`

Specify sampling settings to return a random subset of results. See PostgreSQL TABLESAMPLE (opens new window) for details.

Property	Description
`percentage` *	The percentage of results to return.
`strategy`	The strategy to use: `system` (default) or `bernoulli`.
`seed`	A random seed value.

* Required

{
  ...,
  "sample": {"percentage": 10}
}

SELECT * FROM "table" TABLESAMPLE SYSTEM(10);

// Make sure to add code blocks to your code group

# Issuing a Dataset Query

Dataset queries can be issued using one of the methods depending on the context of the request:

A direct REST API request to the Dataset Query endpoint.
Using the Flo.w CLI tool.
Using the Flo.w engine JavaScript client library.
A Flo.w RDF reactive dataset query.

Note that all methods ultimately result in a REST API request to Flo.w Engine. A request requires the ID of the dataset to query and a query definition as described in the previous section.

In the example below, the curl command is used to issue a query via the REST API:

curl --request POST \
  --url https://flow.emu-analytics.net/api/v1/datasets/airquality/query \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'x-flow-api-key: API_KEY' \
  --data '{"limit":10}'

The same query can be issued using the Flo.w CLI tool:

flow datasets query airquality --limit 10

# Query Timeouts

Flo.w dataset queries will time out if they overrun a configured maximum query time. Configuration of the time-out value (in milliseconds) can be performed at the application, API key, or dataset level by setting the datasetAccessOptions.statementTimeout metadata key on the appropriate resource. The timeout value can also be specified for an individual query by specifying a timeout query parameter with the REST request. The effective value for the query time-out is determined in increasing order of precedence as follows:

Flo.w application datasetAccessOptions.statementTimeout metadata setting.
API key datasetAccessOptions.statementTimeout metadata setting.
Dataset datasetAccessOptions.statementTimeout metadata setting.
REST request timeout query parameter.

Because query results are streamed to the requesting client as a chunked HTTP response, a specific time-out error cannot be reported but the connection will be dropped. Clients should interpret a dropped network connection as a query time-out for the purpose of error reporting.

# Query Result Formats

Query results can be returned in the following formats:

json: an array of data objects (default).
geojson: a GeoJSON FeatureCollection (opens new window).
csv: CSV-formatted text.

To specify the output format, supply the format query parameter or use Flo.w CLI --format option:

# JSON

flow datasets query airquality --limit 1 --format json

[
  {
    "no2": 60,
    "timestamp": "2006-12-31T22:59:59.000Z",
    "name": "Kingston - Kingston Bypass A3",
    "label": "A30",
    "geometry": {
      "type": "MultiPoint",
      "coordinates": [
        [
          -0.292,
          51.3736
        ]
      ]
    }
  }
]

# GeoJSON

flow datasets query airquality --limit 1 --format geojson

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "MultiPoint",
        "coordinates": [
          [
            -0.292,
            51.3736
          ]
        ]
      },
      "properties": {
        "no2": 60,
        "timestamp": "2006-12-31T22:59:59.000Z",
        "name": "Kingston - Kingston Bypass A3",
        "label": "A30"
      }
    }
  ]
}

Export as GeoJSON

Use shell redirection to export a Dataset to a GeoJSON file:

flow datasets query airquality --format geojson > output.json

# CSV

flow datasets query airquality --limit 1 --format csv

no2,timestamp,name,label
60,1167605999000,Kingston - Kingston Bypass A3,A30

Geometry

Note that geometry-type attributes are not included in CSV output.

# Simplified Query Syntax

Simplified dataset queries can be performed by issuing a GET request to the Simple Dataset Query REST endpoint.

Query options are passed as part of the request query string:

Query Option	Description	Example	Notes
`attributes`	Attributes to return	`?attributes=id&attributes=name`	Prefix attribute name with `!` to exclude
`filter`	Apply filter conditions	`?filter[name]=~^Ba&filter[no2]=>40`	See Operator table below
`limit`	Limit results	`?limit=10`
`offset`	Offset results	`?offset=10`
`format`	Result format	`?format=geojson`	See Query Result Formats
`explain`	Explain query	`?explain=true`	Return query explanation

Filter clauses may use the following operators:

Filter Operator	Description	Example
	Equal	?filter[attribute]=val
!	Not equal	?filter[attribute]=!val
>	Greater than	?filter[attribute]=>10
>=	Greater than or equal	?filter[attribute]=>=10
<	Less than	?filter[attribute]=<10
<=	Less than or equal	?filter[attribute]=<=10
~	Regular Expression	?filter[attribute]=~^Ba

# Querying Parameterized Datasets

Parameterized Flo.w Datasets are defined using a SQL select statement including named parameter placeholders that are expanded when a query is performed. Parameterized Datasets move complex query expressions into SQL and simplify issuing queries against the Dataset.

To supply parameters when querying a parameterized Flo.w Dataset:

Add query string name/value pairs when issuing a direct REST API query.
Use the --params option when using the Flo.w CLI tool.
Add parameters when using the Flo.w Client Library.
Add parameters to a Flo.w RDF reactive query.

The example parameterized Dataset airquality_with_params has the following SQL source definition:

SELECT * from airquality WHERE no2 > {{no2_filter}}::double precision order by no2 desc

The no2_filter parameter is supplied when a query is issued:

curl -X GET \
  'https://flow.emu-analytics.net/api/v1/datasets/airquality_with_params/query?no2_filter=50&limit=1' \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: b6b77cc4-904a-46d1-aa0f-3bf3848ce4c7'

TIP

Parameterized Datasets can be combined with additional query options such as attributes, limit and where to further refine the query.

Both the advanced 'POST' and simplified 'GET' query methods can be used.

SQL Injection

Parameters are always passed as strings and are escaped to prevent SQL injection.

If required, use casting within the SQL expression to cast the parameter to the desired type.

# Explaining Dataset Queries

To view the full generated SQL query and query plan when issuing a Dataset query add explain=true as a query parameter to the query request. Query explanation can also be enabled by specifying the --explain option when using the Flo.w CLI tool.

The response is an array of strings with the first element containing the generated SQL. The remaining elements contain the query plan as lines of text. For example:

flow ds query airquality --limit 10 --explain -f json

Results:

[
  "explain SELECT * FROM \"airquality\" LIMIT 10;",
  "Limit  (cost=0.00..0.30 rows=10 width=87)",
  "  ->  Seq Scan on airquality  (cost=0.00..6.05 rows=205 width=87)"
]

← Uploading Data Flo.w CLI →