# Querying Datasets

# Flo.w Query Definitions

Flo.w dataset queries are defined using a JSON format based on SQL. The query can be very simple, for example, setting a limit on the number of results returned, or very complex, such as performing grouping and calculating aggregated statistics.

An example query is shown below with the equivalent SQL:

    {
      "where": { "label": { "$like": "BT%" } },
      "limit": 10
    }
    
    SELECT *
    FROM
     "airquality"
    WHERE
     "label" LIKE 'BT%'
    LIMIT 10;
    
    // Make sure to add code blocks to your code group

    The query definition may contain the following optional query terms:

    {
      "attributes": [], // Select attributes to return
      "distinct": [],   // Specify 'SELECT DISTINCT' or 'SELECT DISTINCT ON (...)'
      "where": {},      // Specify filter clauses
      "group": [],      // Perform grouping of results
      "order": [],      // Sort results
      "limit": 0,       // Limit results
      "offset": 0,      // Offset results
      "sample": {}      // Specify sampling settings
    }
    

    # attributes

    Specify attributes to return in the result set.

    If ommitted, all attributes will be returned (the equivalent of select * in SQL. Attributes can be aliased by using a tuple ([attr, alias]) in place of the attribute name.

      {
        "attributes": [
          "name",
          [
            "no2",
            "no2_alias"
          ]
        ]
      }
      
      SELECT
        "name", "no2" as "no2_alias"
      FROM
        "airquality";
      
      // Make sure to add code blocks to your code group

      # Attribute operators

      The following operators can be used in attribute definitions:

      Operator PostgreSQL Equivalent Examples Notes
      $col "colName" {$col: "colName"} Escapes supplied argument for use as a column name
      $fn funcName(arg1, arg2, ...) {$fn: ["funcName", {$col: "a"}, 2]} Call an arbitrary database function (see section below)
      $cast cast(val as type) {$cast: [{$col: "a"}, "integer"]} Cast data types
      $mode mode() within group (ORDER BY "colName"}) {$mode: {$col: "colName"}} Aggregate function to return 'mode' of a column (most frequent value)

      # Function Attributes

      Calculated attributes that call arbitrary functions can also be defined using the $fn operator. Function calls are specified using the following formats in place of an attribute name:

      Simple format (array-style):

      {"$fn": ["<funcName>", <arg1>, <arg2>, ...]}
      

      Extended format (object-style):

      {"$fn": {
        name: "funcName",
        args: [<arg1>, <arg2>, ...],
        order?: <order clause>,
        orderWithinGroup?: <order clause>,
        filter?: <where clause>
      }}
      

      Calculated attributes will normally be combined with the aliasing feature to give your calculated attribute a name. If a literal column name is required anywhere in your query then you can use the format {"$col": <attr>}. This operator will properly quote and escape the name you supply for inclusion in the generated SQL.

      The extended format allows you to specify optional features used for Postgres SQL aggregate functions, namely:

      • ORDER BY
      • FILTER
      • ORDER WITHIN GROUP

      Note that use of $fn is an advanced feature and the API key that you use to access Flo.w must have this permission set.

        {
          "attributes": [
            [
              { "$fn": [ "max", { "$col": "no2" } ] },
              "maxVal"
            ]
          ]
        }
        
        SELECT
          "max"("no2") as "maxVal"
        FROM
          "airquality";
        
        // Make sure to add code blocks to your code group
          // Calculate vehicle tracks by converting location points from a time-series
          // to linestrings grouping by some vehicle ID.
          {
            "attributes": [
              "id",
              [
                { "$fn": {
                   "name": "st_makeline",
                   "args":[{ "$col": "geometry" } ],
                   "order": "timestamp"
                }},
                "linestring"
              ]
            ],
            "group": "id"
          }
          
          SELECT
            "id", "st_makeline"("geometry" ORDER BY "timestamp") AS "linestring"
          FROM
            "table"
          GROUP BY "id";
          
          // Make sure to add code blocks to your code group

          # distinct

          Specify a distinct clause to remove duplicate rows from the result set.

          Use an empty array for a simple SELECT DISTINCT query:

            {
              "distinct": [],
              ...
            }
            
            SELECT DISTINCT
              *
            FROM
              "airquality";
            
            // Make sure to add code blocks to your code group

            Specify a single attribute or an array of attributes for a SELECT DISTINCT ON (...) query:

              {
                "distinct": 'name', // or ['name', ...],
                ...
              }
              
              SELECT DISTINCT ON ("name")
                *
              FROM
                "airquality";
              
              // Make sure to add code blocks to your code group

              # where

              Specify filtering clauses.

              For simple 'equality' filtering, specify attribute/value pairs within the where clause of the query. Multiple conditions are combined using AND logic:

                {
                  "where": {
                    "name": "BG1",
                    "no2": 100
                  }
                }
                
                SELECT *
                FROM
                  "airquality"
                WHERE
                  "name" = 'BG1' AND "no2" = 100;
                
                // Make sure to add code blocks to your code group

                # Operators

                In addition to simple equality filtering, the following operators are also available:

                Operator PostgreSQL Equivalent Examples Notes
                $or (clause1 OR clause2 ...) {$or: [clause1, clause2, ...]} Combines sub-clauses
                $and (clause1 AND clause2 ...) {$and: [clause1, clause2, ...]} Combines sub-clauses
                $gt > {$gt: 1}
                $gte >= {$gte: 1}
                $lt < {$lt: 1}
                $lte <= {$lte: 1}
                $ne != {$ne: 'a'}
                $eq = {$eq: 'a'} Equivalent to a simple attribute/value pair
                $is IS {$is: null}
                $not IS NOT {$not: null}
                $like LIKE {$like: 'abc%}
                $notLike NOT LIKE {$notLike: 'abc%}
                $iLike ILIKE {$iLike: 'abc%} Case-insensitive
                $notILike NOT ILIKE {$notILike: 'abc%} Case-insensitive
                $regexp ~ {$regexp: '^a.*}
                $notRegexp !~ {$notRegexp: '^a.*}
                $iRegexp ~* {$iRegexp: '^a.*} Case-insensitive
                $notIRegexp !~* {$notIRegexp: '^a.*} Case-insensitive
                $between BETWEEN a AND b {$between: [a, b]}
                $notBetween NOT BETWEEN a AND b {$notBetween: [a, b]}
                $in IN (a, b) {$in: [a, b]}
                  {
                    "where": {
                      "no2": {
                        "$gt": 10
                      },
                      "label": {
                        "$like": "B%"
                      }
                    }
                  }
                  
                  SELECT *
                  FROM
                    "airquality"
                  WHERE
                    "no2" > 10 AND "label" LIKE 'B%';
                  
                  // Make sure to add code blocks to your code group

                  # Spatial Operators

                  The following spatial operators are available:

                  Operator Description Examples
                  $point Construct a point. Supply an array of 2 or 3 elements [x, y, <srid>]. SRID defaults to 4326 (WGS84) if not specified. {$point: [-1.0, 51.0, 4326]}
                  $rect Construct a bounding rectangle. Supply an array of 4 or 5 elements [xMin, yMin, xMax, yMax, <srid>]. SRID defaults to 4326 (WGS84) if not specified. {$rect: [-1.0, 51.0, 1,0, 52, 4326]}
                  $box Construct a 3D bounding box. Supply an array of 6 or 7 elements [xMin, yMin, zMin, yMax, xMax, yMax, zMax, <srid>]. SRID defaults to 4326 (WGS84) if not specified. {$box: [-1.0, 51.0, 0, 1.0, 52, 1000, 4326]}
                  $intersects True if target intersects the supplied geometry {geom: {$intersects: {$rect: {...}}}}
                  $overlaps True if target bounding box overlaps the bounding box of the supplied geometry {geom: {$overlaps: {$rect: {...}}}}
                    {
                      "where": {
                        "geometry": {
                          "$overlaps": {"$rect": [-1, 51, 1, 52]}
                        }
                      }
                    }
                    
                    SELECT *
                    FROM
                      "airquality"
                    WHERE
                      "geometry" && ST_MakeEnvelope(-1, 51, 1, 52, 4326);
                    
                    // Make sure to add code blocks to your code group

                    # Range Operators

                    The following range operators are available:

                    Operator Description Examples
                    $overlaps True if target range overlaps the supplied literal range {r: {$overlaps: "[1, 10]"}}
                    $contains True if target range contains the supplied value {r: {$contains: 5}}

                    Note that range literals follow the convention used by PostgreSQL. Use '[' and ']' for inclusive range bounds. Use '(' and ')' for exclusive range bounds.

                      {
                        "where": {
                          "r": {
                            "$overlaps": "[1, 10)"
                          }
                        }
                      }
                      
                      SELECT *
                      FROM
                        "test_table"
                      WHERE
                        "r" && '[1, 10)';
                      
                      // Make sure to add code blocks to your code group
                        {
                          "where": {
                            "r": {
                              "$contains": 5
                            }
                          }
                        }
                        
                        SELECT *
                        FROM
                          "test_table"
                        WHERE
                          "r" &> 5;
                        
                        // Make sure to add code blocks to your code group

                        # Using boolean functions as where clauses

                        Any PostgreSQL function (built-in or user-defined) that returns a boolean value can be used as a where clause. Use an arbitrary (but descriptive) key to introduce the function:

                          {
                            "where": {
                              "a": {$gt: 5},
                              "custom_spatial_clause": {
                                "$fn": ["st_intersects", {$col: "geometry"}, {$rect: [-1, 51, 1, 52]}]
                              }
                            }
                          }
                          
                          SELECT *
                          FROM
                            "test_table"
                          WHERE
                            "a" > 5 AND "st_intersects"("geometry", ST_MakeEnvelope(-1, 51, 1, 52, 4326)};
                          
                          // Make sure to add code blocks to your code group

                          # Combining clauses

                          For complex logic, sub-clauses can be combined using $and and $or. Sub-clauses can be nested to arbitrary depth:

                            {
                              "where": {
                                "$and": [
                                  {
                                    "$or": [ { "label": "AB1" }, { "label": { "$like": "B%" } } ]
                                  },
                                  { "no2": { "$gt": 10 } }
                                ]
                              }
                            }
                            
                            SELECT *
                            FROM
                             "airquality"
                            WHERE
                             (("label" = 'AB1' OR "label" LIKE 'B%') AND "no2" > 10);
                            
                            // Make sure to add code blocks to your code group

                            # group

                            Specify result grouping (equivalent to 'group by' in SQL).

                            As with SQL, attributes returned from a 'group' query must be mentioned in the group property or be aggregate functions.

                            The group property takes an array of attribute names to group by. As with PostgreSQL, you may also use positional attributes to avoid repeating a complex calculated attribute.

                            The query below groups all results from the 'airquality' dataset based on the initial letter of the name field. The letter groups and record counts are returned ordered alphabetically:

                              {
                                "attributes": [
                                  [ { "$fn": [ "left", { "$col": "name" }, 1 ] }, "First Letter" ],
                                  [ { "$fn": [ "count", "*" ] }, "Count" ]
                                ],
                                "group": [ 1 ]
                              }
                              
                              SELECT "left"("name", 1) as "First Letter", "count"('*') as "Count"
                              FROM
                              "airquality"
                              GROUP BY 1;
                              
                              // Make sure to add code blocks to your code group

                              # order

                              Specify sorting options for the returned results.

                              order may be a single attribute, or an array of attributes to sort by multiple fields. You can also specify the direction by using a tuple ([attr, 'asc|desc']) in place of the attribute name.

                                {
                                  "order": "no2"
                                }
                                
                                SELECT *
                                FROM
                                 "airquality
                                ORDER BY "no2";
                                
                                // Make sure to add code blocks to your code group
                                  {
                                    "order": [ [ "no2", "desc" ], "name" ]
                                  }
                                  
                                  SELECT *
                                  FROM
                                   "airquality"
                                  ORDER BY "no2" DESC, "name";
                                  
                                  // Make sure to add code blocks to your code group

                                  # limit

                                  Specify limit on the number of results returned.

                                  If you do not explicitly specify a limit then a default limit will be applied to your query. This default limit is configured on a per-application basis. A hard maximum limit, which cannot be exceeded, is also configured per-application.

                                    {
                                      "limit": 10
                                    }
                                    
                                    SELECT *
                                    FROM
                                     "airquality"
                                    LIMIT 10;
                                    
                                    // Make sure to add code blocks to your code group

                                    # offset

                                    Specify an offset for the returned results.

                                    offset is typically combined with limit to retrieve pages of results.

                                      {
                                        "limit": 10,
                                        "offset": 10
                                      }
                                      
                                      SELECT *
                                      FROM
                                       "airquality"
                                      LIMIT 10 OFFSET 10;
                                      
                                      // Make sure to add code blocks to your code group

                                      # sample

                                      Specify sampling settings to return a random subset of results. See PostgreSQL TABLESAMPLE (opens new window) for details.

                                      Property Description
                                      percentage * The percentage of results to return.
                                      strategy The strategy to use: system (default) or bernoulli.
                                      seed A random seed value.

                                      * Required

                                        {
                                          ...,
                                          "sample": {"percentage": 10}
                                        }
                                        
                                        SELECT * FROM "table" TABLESAMPLE SYSTEM(10);
                                        
                                        // Make sure to add code blocks to your code group

                                        # Issuing a Dataset Query

                                        Dataset queries can be issued using one of the methods depending on the context of the request:

                                        • A direct REST API request to the Dataset Query endpoint.
                                        • Using the Flo.w CLI tool.
                                        • Using the Flo.w engine JavaScript client library.
                                        • A Flo.w RDF reactive dataset query.

                                        Note that all methods ultimately result in a REST API request to Flo.w Engine. A request requires the ID of the dataset to query and a query definition as described in the previous section.

                                        In the example below, the curl command is used to issue a query via the REST API:

                                        curl --request POST \
                                          --url https://flow.emu-analytics.net/api/v1/datasets/airquality/query \
                                          --header 'Accept: application/json' \
                                          --header 'Content-Type: application/json' \
                                          --header 'x-flow-api-key: API_KEY' \
                                          --data '{"limit":10}'
                                        

                                        The same query can be issued using the Flo.w CLI tool:

                                        flow datasets query airquality --limit 10
                                        

                                        # Query Timeouts

                                        Flo.w dataset queries will time out if they overrun a configured maximum query time. Configuration of the time-out value (in milliseconds) can be performed at the application, API key, or dataset level by setting the datasetAccessOptions.statementTimeout metadata key on the appropriate resource. The timeout value can also be specified for an individual query by specifying a timeout query parameter with the REST request. The effective value for the query time-out is determined in increasing order of precedence as follows:

                                        • Flo.w application datasetAccessOptions.statementTimeout metadata setting.
                                        • API key datasetAccessOptions.statementTimeout metadata setting.
                                        • Dataset datasetAccessOptions.statementTimeout metadata setting.
                                        • REST request timeout query parameter.

                                        Because query results are streamed to the requesting client as a chunked HTTP response, a specific time-out error cannot be reported but the connection will be dropped. Clients should interpret a dropped network connection as a query time-out for the purpose of error reporting.

                                        # Query Result Formats

                                        Query results can be returned in the following formats:

                                        To specify the output format, supply the format query parameter or use Flo.w CLI --format option:

                                        # JSON

                                        flow datasets query airquality --limit 1 --format json
                                        
                                        [
                                          {
                                            "no2": 60,
                                            "timestamp": "2006-12-31T22:59:59.000Z",
                                            "name": "Kingston - Kingston Bypass A3",
                                            "label": "A30",
                                            "geometry": {
                                              "type": "MultiPoint",
                                              "coordinates": [
                                                [
                                                  -0.292,
                                                  51.3736
                                                ]
                                              ]
                                            }
                                          }
                                        ]
                                        

                                        # GeoJSON

                                        flow datasets query airquality --limit 1 --format geojson
                                        
                                        {
                                          "type": "FeatureCollection",
                                          "features": [
                                            {
                                              "type": "Feature",
                                              "geometry": {
                                                "type": "MultiPoint",
                                                "coordinates": [
                                                  [
                                                    -0.292,
                                                    51.3736
                                                  ]
                                                ]
                                              },
                                              "properties": {
                                                "no2": 60,
                                                "timestamp": "2006-12-31T22:59:59.000Z",
                                                "name": "Kingston - Kingston Bypass A3",
                                                "label": "A30"
                                              }
                                            }
                                          ]
                                        }
                                        

                                        Export as GeoJSON

                                        Use shell redirection to export a Dataset to a GeoJSON file:

                                        flow datasets query airquality --format geojson > output.json
                                        

                                        # CSV

                                        flow datasets query airquality --limit 1 --format csv
                                        
                                        no2,timestamp,name,label
                                        60,1167605999000,Kingston - Kingston Bypass A3,A30
                                        

                                        Geometry

                                        Note that geometry-type attributes are not included in CSV output.

                                        # Simplified Query Syntax

                                        Simplified dataset queries can be performed by issuing a GET request to the Simple Dataset Query REST endpoint.

                                        Query options are passed as part of the request query string:

                                        Query Option Description Example Notes
                                        attributes Attributes to return ?attributes=id&attributes=name Prefix attribute name with ! to exclude
                                        filter Apply filter conditions ?filter[name]=~^Ba&filter[no2]=>40 See Operator table below
                                        limit Limit results ?limit=10
                                        offset Offset results ?offset=10
                                        format Result format ?format=geojson See Query Result Formats
                                        explain Explain query ?explain=true Return query explanation

                                        Filter clauses may use the following operators:

                                        Filter Operator Description Example
                                        Equal ?filter[attribute]=val
                                        ! Not equal ?filter[attribute]=!val
                                        > Greater than ?filter[attribute]=>10
                                        >= Greater than or equal ?filter[attribute]=>=10
                                        < Less than ?filter[attribute]=<10
                                        <= Less than or equal ?filter[attribute]=<=10
                                        ~ Regular Expression ?filter[attribute]=~^Ba

                                        # Querying Parameterized Datasets

                                        Parameterized Flo.w Datasets are defined using a SQL select statement including named parameter placeholders that are expanded when a query is performed. Parameterized Datasets move complex query expressions into SQL and simplify issuing queries against the Dataset.

                                        To supply parameters when querying a parameterized Flo.w Dataset:

                                        • Add query string name/value pairs when issuing a direct REST API query.
                                        • Use the --params option when using the Flo.w CLI tool.
                                        • Add parameters when using the Flo.w Client Library.
                                        • Add parameters to a Flo.w RDF reactive query.

                                        The example parameterized Dataset airquality_with_params has the following SQL source definition:

                                        SELECT * from airquality WHERE no2 > {{no2_filter}}::double precision order by no2 desc
                                        

                                        The no2_filter parameter is supplied when a query is issued:

                                        curl -X GET \
                                          'https://flow.emu-analytics.net/api/v1/datasets/airquality_with_params/query?no2_filter=50&limit=1' \
                                          -H 'Content-Type: application/json' \
                                          -H 'x-api-key: b6b77cc4-904a-46d1-aa0f-3bf3848ce4c7'
                                        

                                        TIP

                                        Parameterized Datasets can be combined with additional query options such as attributes, limit and where to further refine the query.

                                        Both the advanced 'POST' and simplified 'GET' query methods can be used.

                                        SQL Injection

                                        Parameters are always passed as strings and are escaped to prevent SQL injection.

                                        If required, use casting within the SQL expression to cast the parameter to the desired type.

                                        # Explaining Dataset Queries

                                        To view the full generated SQL query and query plan when issuing a Dataset query add explain=true as a query parameter to the query request. Query explanation can also be enabled by specifying the --explain option when using the Flo.w CLI tool.

                                        The response is an array of strings with the first element containing the generated SQL. The remaining elements contain the query plan as lines of text. For example:

                                        flow ds query airquality --limit 10 --explain -f json
                                        

                                        Results:

                                        [
                                          "explain SELECT * FROM \"airquality\" LIMIT 10;",
                                          "Limit  (cost=0.00..0.30 rows=10 width=87)",
                                          "  ->  Seq Scan on airquality  (cost=0.00..6.05 rows=205 width=87)"
                                        ]