library(dplyr)
library(jsonlite)
library(httr)
9 APIs
(MA7419 / MA3419)
9.1 Overview
This week we’ll be looking at APIs
9.2 Definitions
API
An application programming interface (API) is an interface or communication protocol between different parts of a computer program intended to simplify the implementation and maintenance of software. (Wikipedia)
REST
Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. (Wikipedia)
Specifically, one of the restful rules is that that you should get data (called a resource) returned when you link to a specific URL.
The URL is called a request and what is sent back is called a response.
You can use restful APIs to send as well as receive data, but we will only look at how to get data.
The API request can be included in a program - so you don’t need a user to click on a download link.
Another piece of jargon is endpoint. This is the base url for the API. This is followed by a path that points to the exact resource.
Finally we can have query parameters. These always begin with a ? and look like:
?query1=param1&query2=param2
where the & separates two query/parameter pairs.
Let’s have an example.
9.3 Example
The endpoint for Github is: https://api.github.com
The path to a specific user’s repos is /users/<username>/repos
.
Try copying https://api.github.com/users/vivait/repos
into your browser…
you should see information returned in JSON.
But we want to access the data in a program, not via a browser.
The package httr
provides tools for HTTP, including the verb GET:
<- function(path) {
github_api <- modify_url("https://api.github.com", path = path)
url GET(url)
}
<- github_api("/users/actuarial-science/repos") resp
We can use jsonlite
to parse the content of the response into a useful R object.
<- fromJSON(content(resp, "text")) repos
We can add some parameters to our query
<- github_api("/users/vivait/repos?sort=updated&per_page=100")
resp <- fromJSON(content(resp, "text")) repos
In fact, if we know the request will return JSON, we can parse it directly with jsonlite
. (Not advised in a program.)
For example, the Github documentation says You can issue a GET request to the root endpoint to get all the endpoint categories that the REST API v3 supports:
head(fromJSON("https://api.github.com"), 10)
$current_user_url
[1] "https://api.github.com/user"
$current_user_authorizations_html_url
[1] "https://github.com/settings/connections/applications{/client_id}"
$authorizations_url
[1] "https://api.github.com/authorizations"
$code_search_url
[1] "https://api.github.com/search/code?q={query}{&page,per_page,sort,order}"
$commit_search_url
[1] "https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}"
$emails_url
[1] "https://api.github.com/user/emails"
$emojis_url
[1] "https://api.github.com/emojis"
$events_url
[1] "https://api.github.com/events"
$feeds_url
[1] "https://api.github.com/feeds"
$followers_url
[1] "https://api.github.com/user/followers"
9.4 Twitter example
NOTE the Twitter (X) API examples below, no longer work (thanks Elon)
They will be replaced soon.
This code demonstrates how to use the rtweet
package.
For more detail, see https://cran.r-project.org/web/packages/rtweet/vignettes/intro.html.
First you’ll need to set up a developer account with Twitter and get the access keys you need by creating a new app.
Follow the instructions at: https://cran.r-project.org/web/packages/rtweet/vignettes/auth.html.
# library(rtweet)
# ## authenticate - insert your app name and keys below
# token <- create_token(
# app = "R camlad",
# consumer_key = api_key,
# consumer_secret = api_secret_key,
# access_token = access_token,
# access_secret = access_token_secret)
Following a hashtag
We can search for tweets including a particular hashtag.
## search for tweets using the Cardano hashtag
# rt <- search_tweets("#Cardano", n = 100, include_rts = FALSE)
#
# ## preview tweets data
# rt |> select(id, text)
Trending in Leicester
# trnds <- get_trends("Leicester")
# trnds |>
# select(trend, tweet_volume) |>
# arrange(desc(tweet_volume))
Get a particular user’s timeline
library(stringr)
# tmls <- get_timeline("leicspolice", n = 100)
#
# tmls |>
# select(created_at, text) |>
# filter(str_detect(text, 'Traffic'))
9.5 Accessing UK census (and other) data
Our final example demonstrates the NOMIS API, which can be accessed through the nomisr
(Odell 2018) package.
A quick demonstration of using nomisr
to extract data from the Nomis API
This example is based on the nomisr introduction vignette
library(nomisr)
First, we can download information on what data is available.
<- nomis_data_info()
data_info #head(data_info)
glimpse(data_info)
Rows: 1,605
Columns: 14
$ agencyid <chr> "NOMIS", "NOMIS", "NOMIS", "NOMIS…
$ id <chr> "NM_1_1", "NM_2_1", "NM_4_1", "NM…
$ uri <chr> "Nm-1d1", "Nm-2d1", "Nm-4d1", "Nm…
$ version <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ annotations.annotation <list> [<data.frame[10 x 2]>], [<data.f…
$ components.attribute <list> [<data.frame[7 x 4]>], [<data.fr…
$ components.dimension <list> [<data.frame[5 x 3]>], [<data.fr…
$ components.primarymeasure.conceptref <chr> "OBS_VALUE", "OBS_VALUE", "OBS_VA…
$ components.timedimension.codelist <chr> "CL_1_1_TIME", "CL_2_1_TIME", "CL…
$ components.timedimension.conceptref <chr> "TIME", "TIME", "TIME", "TIME", "…
$ description.value <chr> "Records the number of people cla…
$ description.lang <chr> "en", "en", "en", "en", "en", "en…
$ name.value <chr> "Jobseeker's Allowance with rates…
$ name.lang <chr> "en", "en", "en", "en", "en", "en…
There’s a lot here (data_info
has 1605 rows). To dig deeper we can search the column description.value
or name.value
for key words.
<-
pop_data_info |>
data_info filter(str_detect(name.value, "(?i)population")) |>
select(id, name.value)
#pop_data_info |> head()
glimpse(pop_data_info)
Rows: 110
Columns: 2
$ id <chr> "NM_17_1", "NM_17_5", "NM_31_1", "NM_100_1", "NM_136_1", "N…
$ name.value <chr> "annual population survey", "annual population survey (vari…
Suppose we wanted population data for Leicester. It looks like “NM_31_1” might be worth investigating, so we can dig down deeper.
The data or is categorised first by “concept” (Read the docs at nomis if you want more details.)
= "NM_31_1"
id nomis_get_metadata(id)
# A tibble: 6 × 3
codelist conceptref isfrequencydimension
<chr> <chr> <chr>
1 CL_31_1_GEOGRAPHY GEOGRAPHY false
2 CL_31_1_SEX SEX false
3 CL_31_1_AGE AGE false
4 CL_31_1_MEASURES MEASURES false
5 CL_31_1_FREQ FREQ true
6 CL_31_1_TIME TIME false
GEOGRAPHY looks relevant, so we explore what “types” are available.
nomis_get_metadata(id, "GEOGRAPHY", type = "type")
# A tibble: 26 × 3
id label.en description.en
<chr> <chr> <chr>
1 TYPE83 jobcentre plus group as of April 2019 jobcentre plu…
2 TYPE84 jobcentre plus district as of April 2019 jobcentre plu…
3 TYPE342 english index of multiple deprivation 2010 - deciles english index…
4 TYPE347 scottish index of multiple deprivation 2009 - deciles scottish inde…
5 TYPE349 welsh index of multiple deprivation 2008 - deciles welsh index o…
6 TYPE431 local authorities: county / unitary (as of April 2021) local authori…
7 TYPE432 local authorities: district / unitary (as of April 20… local authori…
8 TYPE433 local authorities: county / unitary (as of April 2019) local authori…
9 TYPE434 local authorities: district / unitary (as of April 20… local authori…
10 TYPE442 combined authorities combined auth…
# ℹ 16 more rows
Finally, we can choose a particular type and investigate it.
|>
id nomis_get_metadata("GEOGRAPHY", type = "TYPE446") |>
filter(str_detect(label.en, "Leicester"))
# A tibble: 2 × 4
id parentCode label.en description.en
<chr> <chr> <chr> <chr>
1 1870659636 2013265924 Leicester Leicester
2 1870659640 2013265924 Leicestershire Leicestershire
Looks like we’ve found what we want!
<-
leics_pop nomis_get_data(id = id, time = "latest",
geography = c("1870659636", "1870659640"))
|>
leics_pop select(DATE, GEOGRAPHY_NAME, SEX_NAME, AGE_NAME, MEASURES_NAME, OBS_VALUE) |>
head(10)
# A tibble: 10 × 6
DATE GEOGRAPHY_NAME SEX_NAME AGE_NAME MEASURES_NAME OBS_VALUE
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2021 Leicester Male All ages Value NA
2 2021 Leicester Male All ages Percent NA
3 2021 Leicester Male Aged under 1 year Value NA
4 2021 Leicester Male Aged under 1 year Percent NA
5 2021 Leicester Male Aged 1 - 4 years Value NA
6 2021 Leicester Male Aged 1 - 4 years Percent NA
7 2021 Leicester Male Aged 5 - 9 years Value NA
8 2021 Leicester Male Aged 5 - 9 years Percent NA
9 2021 Leicester Male Aged 10 - 14 years Value NA
10 2021 Leicester Male Aged 10 - 14 years Percent NA
9.6 Homework
Install the package randNames
and, using the instructions in the package documentation register for a free API key at randomapi.com.
Write a programme to download random data for 400 imaginary users. What is the distribution of genders and country of origin in this data.
Optional Christmas Bonus question
Register an account at Advent of Code. For the 2020 competition solve Question 2. (The key to solving this elegantly is reading the data in and wrangling it into the best format to solve the problem.)