Open Data Maker exposes a read-only HTTP API for querying available datasets. This document explains:
- How to define and execute queries as URLs
- Refining query results using option parameters
- Extracting query results in JSON and CSV format
- Detecting query errors
Please note: A valid API key is required for those looking to access the API for the College Scorecard Data, which is proxied through an API gateway. Developers may register a key at https://api.data.gov/signup for API access.
Each query is expressed as a URL, containing:
-
The URL Path to the Open Data Maker service, e.g.
https://collegescorecard.ed.gov/
(for Open Data Maker running its own server), orhttps://api.data.gov/ed/collegescorecard/v1/
(for Open Data Maker proxied through an API gateway) -
The API Version String. Currently the only supported version string is:
v1
-
The Endpoint representing a particular dataset, e.g.
schools
. Endpoint names are usually plural. -
The Query String containing a set of named key-value pairs that represent the query, which incude
- Field Parameters, specifying a value (or set of values) to match against a particular field, and
Here's an example query URL:
https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=2,3&fields=id,school.name,2013.student.size
In this query URL:
https://api.data.gov/ed/collegescorecard/v1/
is the URL Path.v1
is the API Version String, followed by/
, which separates it from the Endpoint.schools
is the Endpoint. Note the plural..json
is the Format. Note the dot between the Endpoint and Format. Also note that, since JSON is the default output format, we didn't have to specify it.- In keeping with standard URI Query String syntax, the
?
and&
characters are used to begin and separate the list of query parameters. school.degrees_awarded.predominant=2,3
is a Field Parameter. In this case, it's searching for records which have aschool.degrees_awarded.predominant
value of either2
or3
._fields=id,school.name,2013.student.size
is an Option Parameter, as denoted by the initial underscore character._fields
is used to limit the output fields to those in the given list. We strongly recommend using the_fields
parameter to reduce the amount of data returned by the API, thus increasing performance.
Here's an example of the JSON document that Open Data Maker would provide in response to the above query:
{
"metadata": {
"total": 3667,
"page": 0,
"per_page": 20
},
"results": [
{
"id": 190752,
"school.name": "Yeshiva of Far Rockaway Derech Ayson Rabbinical Seminary",
"2013.student.size": 57
},
{
"id": 407009,
"school.name": "Arizona State University-West",
"2013.student.size": 3243
},
{
"id": 420574,
"school.name":"Arizona State University-Polytechnic",
"2013.student.size": 3305
}
// ... (further results removed) ...
]
}
A successful query will return a JSON with two top-level elements:
metadata
: A JSON Object containing information about the results returned. The metadata fields are:total
: The total number of records matching the querypage
: The page number for this result setper_page
: The number of records returned in a single result set. (For more information about thepage
andper_page
fields, see the section on Pagination)
results
: A JSON Array of record objects. Due to the use of the_fields
option in this query, there are only three fields in each record - the three fields specified in the_fields
parameter. When the_fields
parameter is omitted, the full record is provided.
Let's change the query so as to generate an error when it's executed:
https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=frog&_fields=id,school.name,wombat
This is the JSON document returned:
{
"errors": [
{
"error": "field_not_found",
"input": "wombat",
"message": "The input field 'wombat' (in the fields parameter) is not a field in this dataset."
},
{
"error": "parameter_type_error",
"parameter": "school.degrees_awarded.predominant",
"input": "frog",
"expected_type": "integer",
"input_type": "string",
"message": "The parameter 'school.degrees_awarded.predominant' expects a value of type integer, but received 'frog' which is a value of type string."
}
]
}
When failing to execute a query, Open Data Maker will attempt to return a JSON error document that explains the problem. Points to note:
- The HTTP response status will be in the 400 or 500 range, indicating a problem.
- The JSON document contains a single top-level element,
errors
, containing a JSON Array of error objects. Instead of simply returning the first error encountered, Open Data Maker attempts to list all the problems that it detected. - Error objects always contain these two elements:
error
: a symbolic error code for the specific error that occurred.message
: an English description of the error.
- Error objects may also contain these fields:
input
: the provided input which triggered the error.parameter
: the parameter in which theinput
was supplied.input_type
& expected_type`: In the case of a type mismatch, these fields list the data types that were provided and expected.
Parameter names are assumed to be field names in the dataset. Supplying a value to a field parameter acts as a query filter, and only returns records where the given field exactly matches the given value.
For example: Use the parameter school.region_id=6
to only fetch records with a school.region_id
value of 6
.
Certain text fields in the dataset - those with the autocomplete
data type - allow querying with a list of words. To search for a given word or string of words in those fields, just provide a list of space-separated words. This will return all records where the given field contains the given words as part of a string. Note that all given words have to be at least three characters long.
For example: To search for school names containing the words New York
, use this parameter: school.name=New%20York
(%20
is a URL-encoded space) This will match all of these names:
New York College of Health Professions
American Academy of Dramatic Arts-New York
School of Professional Horticulture at the New York Botanical Garden
The New College of York
(because the parameter words don't have to be found together)Royal College of New Yorkminster
(because parameter words are matched as parts of other words)
but not this name:
New England School of Arts
(becauseYork
is missing)
To filter by a set of strings or integers you can provide a comma-separated list of values. This will query the field for an exact match with any of the supplied values.
For example: school.degrees_awarded.predominant=2,3,4
will match records with a school.degrees_awarded.predominant
value of 2
, 3
or 4
.
Note: Value lists with wildcards or floating-point numbers are not currently supported.
To exclude a set of records from results, use a negative match (also known as an inverted match). Append the characters __not
to the parameter name to specify a negative match.
For example: school.region_id__not=5
matches on records where the school.region_id
does not equal 5
.
To match on field values in a particular numeric range, use a range match. Append the characters __range
to the parameter name to specify a range match, and provide two numbers separated by two periods (..
).
For example: 2013.student.size__range=100..500
matches on schools which had between 100 and 500 students in 2013.
Open-ended ranges can be performed by omitting one side of the range. For example: 2013.student.size__range=1000..
matches on schools which had over 1000 students.
- Both integer and floating-point ranges are supported.
- The left-hand number in a range must be lower than the right-hand number.
- Ranges are inclusive. For example, the range
3..7
matches both3
and7
, as well as the numbers in between.
You can perform extra refinement and organisation of search results using option parameters. These special parameters are listed below.
By default, records returned in the query response include all their stored fields. However, you can limit the fields returned with the fields
option parameter. This parameter takes a comma-separated list of field names. For example: fields=id,school.name,school.state
will return result records that only contain those three fields.
Requesting specific fields in the response will significantly improve performance and reduce JSON traffic, and is recommended.
By default, results are returned in pages of 20 records at a time. To retrieve pages after the first, set the page
option parameter to the number of the page you wish to retrieve. Page numbers start at zero; so, to return records 21 through 40, use page=1
. Remember that the total number of records available for a given query is given in the total
field of the top-level metadata
object.
You can also change the number of records returned per page using the per_page
option parameter, up to a maximum of 100 records. Bear in mind, however, that large result pages will increase the amount of JSON returned and reduce the performance of the API.
To sort results by a given field, use the sort
option parameter. For example, sort=2015.student.size
will return records sorted by 2015 student size, in ascending order.
By default, using the sort
option returns records sorted into ascending order, but you can specify ascending or descending order by appending :asc
or :desc
to the field name. For example: sort=2015.student.size:desc
Note: Sorting is only available on fields with the data type integer
, float
, autocomplete
or name
.
Note: Make sure the sort parameter is a field in the data set. For more information, please take a look at data dictionary
When the dataset includes a location
at the root level (location.lat
and
location.lon
) then the documents will be indexed geographically. You can use the zip
and distance
options to narrow query results down to those within a geographic area. For example, zip=12345&distance=10mi
will return only those results within 10 miles of the center of the given zip code.
Additionally, you can request location.lat
and location.lon
in a search that includes a fields
filter and it will return the record(s) with respective lat and/or lon coordinates.
- By default, any number passed in the
distance
parameter is treated as a number of miles, but you can specify miles or kilometers by appendingmi
orkm
respectively. - Distances are calculated from the center of the given zip code, not the boundary.
- Only U.S. zip codes are supported.
With the inclusion of the Department of Education's Field of Study data, there are a number of new improvements that have been incorporated into Open Data Maker.
- The field of study data is included as an array of objects nested under a specified key. These objects may be queried just like any other data. However, there is an additional parameters to add to your API call to manage what is returned. By default, if specifying a search parameter, only objects of the array that match that parameter will be returned. You can pass
&all_programs_nested=true
to return all the items in the array instead of just those that match. - When specifying specific fields to be returned from the API, the default response is to have a dotted string of the path to the field returned. As of version 1.7, you can pass the parameter
keys_nested=true
get back a true json object instead of the dotted string. - Lastly, wildcard fields are now possible with version 1.7. If you want to get back data for just the latest available data, it is now possible to specify a query such as
fields=id,school,latest
which will return the ID field, the School object and the Latest object and all the nested objects contained within each.