GET /api/v1/datasets
¶
This API returns an application/json
document describing a filtered
collection of datasets accessible to the client. (An unauthenticated client
can only list datasets with access public
.)
The collection of datasets may be filtered using any combination of a number
of query parameters, including owner
, access
, name
substring, date range,
and arbitrary metadata filter expressions. The selected datasets may be sorted
by any metadata key value in either ascending or descending order. Multiple
sort parameters will be processed in order.
Large collections can be paginated for efficiency using the limit
and offset
query parameters.
The keysummary
and daterange
query parameters (if true
) select “summary”
modes where aggregate metadata is returned without a list of datasets. These two
may be used together, but cannot be used along with the normal collection list
mode as they aren’t subject to pagination.
Query parameters¶
access
string
Select whether only private
or only public
access datasets will be included
in the list. By default, all datasets readable by the authenticated user are
included. For example, without constraints /datasets/list
for an authenticated
user will include all public
datasets plus all datasets owned by the
authenticated user; specifying private
will show only the authenticated user’s
private datasets, while specifying public
will show only public
datasets
(regardless of ownership).
daterange
boolean
Instead of returning a filtered set of datasets, return only the upload
timestamps of the oldest and most recent datasets in the filtered set. This
may be useful for initializing a date picker. If no datasets are selected by
the specified filters, the from
and to
keys (see
results) will not be returned.
end
date/time
Select only datasets created on or before the specified time. Time should be
specified in ISO standard format, as YYYY-MM-DDThh:mm:ss.ffffff[+|-]HH:MM
.
If the timezone offset is omitted it will be assumed to be UTC (+00:00
); if
the time is omitted it will be assumed as midnight (00:00:00
) on the
specified date.
filter
metadata filtering
Select datasets matching the metadata expressions specified via filter
query parameters. Each expression has the format [chain]key:[op]value[:type]
:
chain
Prefix an expression with^
(circumflex) to allow combining a set of expressions withOR
rather than the defaultAND
.key
The name of a metadata key (for example,dataset.name
)op
An operator to specify how to compare the key value:=
(Default) Compare for equality~
Compare against a substring>
Greater than<
Less than>=
Greater than or equal to<=
Less than or equal to!=
Not equal
value
The value to compare against. This will be interpreted based on the specified type.type
The string value will be cast to this type. Any value can be cast to typestr
. General metadata keys (server
,global
,user
, anddataset.metalog
namespaces) that have values incompatible with the specified type will be ignored. If you specify an incompatible type for a primarydataset
key, an error will be returned as these types are defined by the Pbench schema so no match would be possible. (For example,dataset.name:2:int
ordataset.access:2023-05-01:date
.)str
(Default) Compare as a stringbool
Compare as a booleanint
Compare as an integerdate
Compare as a date-time string. ISO-8601 recommended, and UTC is assumed if no timezone is specified.
For example, dataset.name:foo
looks for datasets with the name “foo” exactly,
whereas dataset.name:~foo
looks for datasets with a name containing the
substring “foo”.
Multiple expressions may be combined across multiple filter
query parameters
or as comma-separated lists in a single query parameter. Multiple filter
expressions are combined as an AND
expression, matching only when all
expressions match. However any consecutive set of expressions starting with ^
are collected into an “OR
list” that will be AND
-ed with the surrounding
terms.
For example,
filter=dataset.name:a,server.origin:EC2
returns datasets with a name of “a” and an origin of “EC2”.filter=dataset.name:~andy,^server.origin:EC2,^server.origin:RIYA, dataset.access:public
returns only “public” datasets with a name containing the string “andy” which also have an origin of either “EC2” or “RIYA”. As a SQL query, we might write it asdataset.name like "%andy%" and (server.origin = 'EC2' or server.origin = 'RIYA') and dataset.access = 'public'
.
NOTE: filter
expression term values, like the true
in
GET /api/v1/datasets?filter=server.archiveonly:true
, are by default
interpreted as strings, so be careful about the string representation of the
value. In this case, server.archiveonly
is a boolean, which will be matched
as a string value “true” or “false”. You can instead specify the expression
term as server.archiveonly:t:bool
which will treat the specified match value
as a boolean (t[rue]
or y[es]
for true, f[alse]
or n[o]
for false) and
match against the boolean metadata value.
keysummary
boolean
Instead of displaying a list of selected datasets and metadata, use the set of
specified filters to accumulate a nested report on the metadata key namespace
for the set of datasets. See metadata for deails on the
Pbench Server metadata namespaces. Because the global
and user
namespaces
are completely dynamic, and the dataset.metalog
sub-namespace varies greatly
across Pbench Agent benchmark scripts, this mode provides a mechanism for a
metadata visualizer to understand what’s available for a set of datasets. If no
datasets are selected by the specified filters, the keys
key (see
results) will be set to an empty object.
limit
integer
“Paginate” the selected datasets by returning at most limit
datasets. This
can be used in conjunction with offset
to progress through the full list in
smaller chunks either to assist with a paginated display or to limit data
transfer requirements.
metadata
list
Request the named metadata to be returned along with the
resource name and ID. This can be a single metadata key path or comma-separated
list of such strings. The metadata
query parameter can be repeated. For
example, the following are all equivalent:
?metadata=dataset.created,server.deletion,user
?metadata=dataset.created&metadata=dataset.deletion,user
?metadata=dataset.created&metadata=dataset.deletion&metadata=user
mine
boolean
Allows filtering for datasets owned by the authenticated client (if the value
is omitted, e.g., ?mine
or ?mine=true
) or owned by other users (e.g.,
?mine=false
).
name
string
Select only datasets with a specified substring in their name. The filter
?name=fio
is semantically equivalent to ?filter=dataset.name:~fio
.
offset
integer
“Paginate” the selected datasets by skipping the first offset
datasets that
would have been selected by the other query terms. This can be used with
limit
to progress through the full list in smaller chunks either to assist
with a paginated display or to limit data transfer requirements.
owner
string
Select only datasets owned by the specified username. Unless the username
matches the authenticated user, only “public” datasets can be selected.
sort
sort expression
Sort the returned datasets by one or more sort expressions. You can separate
multiple expressions using comma lists, or across separate sort
query
parameters, which will be processed in order. Any Metadata namespace key can
be specified.
Specify a sort order using the keywords asc
(ascending) or desc
(descending), separated from the key name with a colon (:
). For example,
dataset.name:asc
or dataset.metalog.pbench.script:desc
. The default is
“ascending” if no order is specified. If no sort expressions are specified,
datasets are returned sorted by dataset.resource_id
.
For example, GET /api/v1/datasets?sort=global.dashboard.seen:desc,dataset.name
will return selected datasets sorted first in descending order based on whether
the dataset has been marked “seen” by the dashboard, and secondly sorted by the
dataset name. The Pbench Dashboard stores global.dashboard.seen
as a boolean
value, so in this case true
values will appear before false
values.
start
date/time
Select only datasets created on or after the specified time. Time should be
specified in ISO standard format, as YYYY-MM-DDThh:mm:ss.ffffff[+|-]HH:MM
.
If the timezone offset is omitted it will be assumed to be UTC (+00:00
); if
the time is omitted it will be assumed as midnight (00:00:00
) on the
specified date.
Request headers¶
authorization: bearer
token [optional]
Bearer schema authorization is required to access any non-public dataset.
E.g., authorization: bearer <token>
. If omitted, the client is unauthenticated
and only public
access datasets will be selected.
Response headers¶
content-type: application/json
The return is a serialized JSON object with information about the selected
datasets.
Resource access¶
Only
<dataset>
resources selected by the filter to which the authenticated user has READ access will be returned.
See Access model
Response status¶
200
OK
Successful request.
401
UNAUTHORIZED
The client did not provide an authentication token but asked to filter datasets
by owner
, access=private
, mine
, or asked for user
namespace metadata.
403
FORBIDDEN
The client asked to filter access=private
datasets for an owner
for which
the client does not have READ access.
503
SERVICE UNAVAILABLE
The server has been disabled using the server-state
server configuration
setting in the server configuration API. The response
body is an application/json
document describing the current server state,
a message, and optional JSON data provided by the system administrator.
Response body¶
Dataset date range¶
The application/json
response body is a JSON object describing the earliest
and most recent dataset upload time for the selected list of datasets. If the
collection filters exclude all datasets (the result set is empty), the return
value will be empty, omitting both the from
and to
keywords.
{
"from": "2023-03-17T03:14:02.013184+00:00",
"to": "2023-04-05T11:29:02.585772+00:00"
}
Dataset list¶
The application/json
response body contains a list of objects which describe
the datasets selected by the specified query criteria, along with the total
number of matching datasets and a next_url
to support pagination.
next_url¶
When pagination is used, this gives the full URI to acquire the next page using
the same metadata
and limit
values. The client can simply GET
this URI for
the next page. When the entire collection has been returned, next_url
will be
null.
total¶
The total number of datasets matching the filter criteria regardless of the pagination settings.
results¶
The paginated dataset collection.
Each of these objects contains the following fields:
resource_id
: The internal unique ID of the dataset within the Pbench Server. This value will be used to reference the dataset in most other APIs.name
: The resource name given to the dataset. While this has an initial default value related to the benchmark script, date, and user configuration parameters, this value can be changed by the owner of the dataset and is for display purposes and must not be assumed to be unique or definitive.metadata
: If additional metadata was requested, it will appear as a nested JSON object in this field.
For example, the query
GET https://host/api/v1/datasets/list?metadata=user.dashboard.favorite&limit=3
might return:
{
"next_url": "https://pbench.example.com/api/v1/datasets?limit=3&metadata=user.dashboard.favorite&offset=3",
"results": [
{
"metadata": {
"user.dashboard.favorite": null
},
"name": "pbench-user-benchmark__2023.03.23T20.26.03",
"resource_id": "001ab7f04079f620f6f624b6eea913df"
},
{
"metadata": {
"user.dashboard.favorite": null
},
"name": "pbench-user-benchmark__2023.03.18T19.07.42",
"resource_id": "006fab853eb42907c6c202af1d6b750b"
},
{
"metadata": {
"user.dashboard.favorite": null
},
"name": "fio__2023.03.28T03.58.19",
"resource_id": "009ad5f818d9a32af6128dd2b0255161"
}
],
"total": 722
}
Key namespace summary¶
When the keysummary
query parameter is true
(e.g., either ?keysummary
or
?keysummary=true
), instead of reporting a list of datasets and metadata for
each dataset, the application/json
response body contains a hierarchical
representation of the aggregate metadata namespace across all selected datasets.
This returns much less data and is not subject to pagination.
“Leaf” nodes in the metadata tree are represented by null
values while any
key with children will be represented as a nested JSON object showing those
child keys. From the example output below a client can identify many key paths
including dataset.access
and dataset.metalog.controller.hostname
.
Any of the partial or complete key paths represented in the output document are
valid targets for metadata queries: for example dataset.metalog.pbench.script
is a “leaf” node, but GET /api/v1/datasets?metadata=dataset.metalog.pbench
will return a JSON document with the keys config
, date
, hostname_f
,
hostname_ip
, hostname_s
, iterations
, name
, rpm-version
, script
, and
tar-ball-creation-timestamp
.
{
"keys": {
"dataset": {
"access": null,
"id": null,
"metalog": {
"controller": {
"hostname": null,
"hostname-alias": null,
"hostname-all-fqdns": null,
"hostname-all-ip-addresses": null,
"hostname-domain": null,
"hostname-fqdn": null,
"hostname-ip-address": null,
"hostname-nis": null,
"hostname-short": null,
"ssh_opts": null
},
"iterations/1-default": {
"iteration_name": null,
"iteration_number": null,
"user_script": null
},
"pbench": {
"config": null,
"date": null,
"hostname_f": null,
"hostname_ip": null,
"hostname_s": null,
"iterations": null,
"name": null,
"rpm-version": null,
"script": null,
"tar-ball-creation-timestamp": null
},
"run": {
"controller": null,
"end_run": null,
"raw_size": null,
"start_run": null
},
"tools": {
"group": null,
"hosts": null,
"trigger": null
},
"tools/dbutenho.bos.csb": {
"hostname-alias": null,
"hostname-all-fqdns": null,
"hostname-all-ip-addresses": null,
"hostname-domain": null,
"hostname-fqdn": null,
"hostname-ip-address": null,
"hostname-nis": null,
"hostname-short": null,
"label": null,
"rpm-version": null,
"tools": null,
"vmstat": null
},
"tools/dbutenho.bos.csb/vmstat": {
"install_check_output": null,
"install_check_status_code": null,
"options": null
}
},
"name": null,
"owner_id": null,
"resource_id": null,
"uploaded": null
},
"server": {
"deletion": null,
"origin": null,
"tarball-path": null
}
}
}
Combining key namespace summary and date range¶
When both the keysummary
and daterange
query parameters are true
, the
application/json
response body contains the from
, to
, and keys
key
values. If the selected collection filters produce no results, as with
daterange
alone, the from
and to
keys will be omitted and the value of
keys
will be an empty object.