Brisbane Library Checkout Data
Description
This has been copied from the README.md file
bris-lib-checkout
This provides tidied up data from the Brisbane library checkouts
Retrieving and cleaning the data
The script for retrieving and cleaning the data is made available in scrape-library.R.
The data
- The data/ folder contains the tidy data
- The data-raw/ folder contains the raw data
data/
This contains four tidied up dataframes:
- tidy-brisbane-library-checkout.csv
- metadata_branch.csv
- metadata_heading.csv
- metadata_item_type.csv
tidy-brisbane-library-checkout.csv contains the following columns, with the metadata file metadata_heading containing the description of these columns.
knitr::kable(readr::read_csv("data/metadata_heading.csv"))
#> Parsed with column specification:
#> cols(
#> heading = col_character(),
#> heading_explanation = col_character()
#> )
heading
heading_explanation
Title
Title of Item
Author
Author of Item
Call Number
Call Number of Item
Item id
Unique Item Identifier
Item Type
Type of Item (see next column)
Status
Current Status of Item
Language
Published language of item (if not English)
Age
Suggested audience
Checkout Library
Checkout branch
Date
Checkout date
We also added year, month, and day columns.
The remaining data are all metadata files that contain meta information on the columns in the checkout data:
library(tidyverse)
#> ── Attaching packages ────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
#> ✔ tibble 1.4.99.9006 ✔ dplyr 0.7.8
#> ✔ tidyr 0.8.2 ✔ stringr 1.3.1
#> ✔ readr 1.3.0 ✔ forcats 0.3.0
#> ── Conflicts ───────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
knitr::kable(readr::read_csv("data/metadata_branch.csv"))
#> Parsed with column specification:
#> cols(
#> branch_code = col_character(),
#> branch_heading = col_character()
#> )
branch_code
branch_heading
ANN
Annerley
ASH
Ashgrove
BNO
Banyo
BRR
BrackenRidge
BSQ
Brisbane Square Library
BUL
Bulimba
CDA
Corinda
CDE
Chermside
CNL
Carindale
CPL
Coopers Plains
CRA
Carina
EPK
Everton Park
FAI
Fairfield
GCY
Garden City
GNG
Grange
HAM
Hamilton
HPK
Holland Park
INA
Inala
IPY
Indooroopilly
MBG
Mt. Coot-tha
MIT
Mitchelton
MTG
Mt. Gravatt
MTO
Mt. Ommaney
NDH
Nundah
NFM
New Farm
SBK
Sunnybank Hills
SCR
Stones Corner
SGT
Sandgate
VAN
Mobile Library
TWG
Toowong
WND
West End
WYN
Wynnum
ZIL
Zillmere
knitr::kable(readr::read_csv("data/metadata_item_type.csv"))
#> Parsed with column specification:
#> cols(
#> item_type_code = col_character(),
#> item_type_explanation = col_character()
#> )
item_type_code
item_type_explanation
AD-FICTION
Adult Fiction
AD-MAGS
Adult Magazines
AD-PBK
Adult Paperback
BIOGRAPHY
Biography
BSQCDMUSIC
Brisbane Square CD Music
BSQCD-ROM
Brisbane Square CD Rom
BSQ-DVD
Brisbane Square DVD
CD-BOOK
Compact Disc Book
CD-MUSIC
Compact Disc Music
CD-ROM
CD Rom
DVD
DVD
DVD_R18+
DVD Restricted - 18+
FASTBACK
Fastback
GAYLESBIAN
Gay and Lesbian Collection
GRAPHICNOV
Graphic Novel
ILL
InterLibrary Loan
JU-FICTION
Junior Fiction
JU-MAGS
Junior Magazines
JU-PBK
Junior Paperback
KITS
Kits
LARGEPRINT
Large Print
LGPRINTMAG
Large Print Magazine
LITERACY
Literacy
LITERACYAV
Literacy Audio Visual
LOCSTUDIES
Local Studies
LOTE-BIO
Languages Other than English Biography
LOTE-BOOK
Languages Other than English Book
LOTE-CDMUS
Languages Other than English CD Music
LOTE-DVD
Languages Other than English DVD
LOTE-MAG
Languages Other than English Magazine
LOTE-TB
Languages Other than English Taped Book
MBG-DVD
Mt Coot-tha Botanical Gardens DVD
MBG-MAG
Mt Coot-tha Botanical Gardens Magazine
MBG-NF
Mt Coot-tha Botanical Gardens Non Fiction
MP3-BOOK
MP3 Audio Book
NONFIC-SET
Non Fiction Set
NONFICTION
Non Fiction
PICTURE-BK
Picture Book
PICTURE-NF
Picture Book Non Fiction
PLD-BOOK
Public Libraries Division Book
YA-FICTION
Young Adult Fiction
YA-MAGS
Young Adult Magazine
YA-PBK
Young Adult Paperback
Example usage
Let’s explore the data
bris_libs <- readr::read_csv("data/bris-lib-checkout.csv")
#> Parsed with column specification:
#> cols(
#> title = col_character(),
#> author = col_character(),
#> call_number = col_character(),
#> item_id = col_double(),
#> item_type = col_character(),
#> status = col_character(),
#> language = col_character(),
#> age = col_character(),
#> library = col_character(),
#> date = col_double(),
#> datetime = col_datetime(format = ""),
#> year = col_double(),
#> month = col_double(),
#> day = col_character()
#> )
#> Warning: 20 parsing failures.
#> row col expected actual file
#> 587795 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590579 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590597 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 595774 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 597567 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> ...... ....... ........ ....... ............................
#> See problems(...) for more details.
We can count the number of titles, item types, suggested age, and the library given:
library(dplyr)
count(bris_libs, title, sort = TRUE)
#> # A tibble: 121,046 x 2
#> title n
#> <chr> <int>
#> 1 Australian house and garden 1469
#> 2 New scientist (Australasian ed.) 1380
#> 3 Australian home beautiful 1331
#> 4 Country style 1229
#> 5 The New idea 1186
#> 6 Hello 1133
#> 7 Woman's day 1096
#> 8 Country life 1056
#> 9 Better homes and gardens. (AU) 1041
#> 10 Yi Zhou Kan 884
#> # … with 121,036 more rows
count(bris_libs, item_type, sort = TRUE)
#> # A tibble: 69 x 2
#> item_type n
#> <chr> <int>
#> 1 PICTURE-BK 121126
#> 2 DVD 98283
#> 3 AD-PBK 91671
#> 4 JU-PBK 88402
#> 5 NONFICTION 76168
#> 6 AD-MAGS 60516
#> 7 AD-FICTION 53090
#> 8 LARGEPRINT 19113
#> 9 JU-FICTION 17261
#> 10 LOTE-BOOK 12303
#> # … with 59 more rows
count(bris_libs, age, sort = TRUE)
#> # A tibble: 5 x 2
#> age n
#> <chr> <int>
#> 1 ADULT 420287
#> 2 JUVENILE 283902
#> 3 YA 13715
#> 4 <NA> 147
#> 5 UNKNOWN 36
count(bris_libs, library, sort = TRUE)
#> # A tibble: 38 x 2
#> library n
#> <chr> <int>
#> 1 SBK 49154
#> 2 BSQ 45968
#> 3 CNL 45642
#> 4 IPY 44569
#> 5 GCY 43090
#> 6 CDE 42775
#> 7 ASH 42086
#> 8 WYN 35124
#> 9 KEN 33947
#> 10 MTO 31201
#> # … with 28 more rows
License
This data is provided under a CC BY 4.0 license
It has been downloaded from Brisbane library checkouts, and tidied up using the code in data-raw.
Notes
Files
README.md
Files
(63.9 MB)
Name | Size | Download all |
---|---|---|
md5:27ddc9cf0cbb764ab16e991f8ca4344c
|
63.9 MB | Download |
md5:715eef06df62792c9bddd8cc07eee91f
|
10.3 kB | Preview Download |
md5:318ddd88323678f150dc7b92747caa4d
|
3.3 kB | Download |