Dataset Open Access
Nystrom, Eric C.; Tanenhaus, David S.
This is data to accompany an article by Eric C. Nystrom and David S. Tanenhaus, "'Our Most Sacred Legal Commitments:' A Digital Exploration of the U.S. Supreme Court Defining Who We Are and How They Should Opine," University of Cincinnati Law Review 89, no. 4 (May 2021).
This data was generated using the "cap-tools" suite of programs (written by Eric C. Nystrom and available at https://github.com/ericnystrom/captools). The current data version (05202020, 07152020) included with this repository was generated by running "cap-tools" against:
"CURRENT-our-kwic-cap-scdb" is a TSV of keyword-in-context (KWIC) results for the term "our," with a six word window on each side of the term. Basic results were then filtered to exclude any results that were not found in the SCDB-CAP map, and the SCDB ID was added. The file contains 79693 records plus a first-line header.
1: "cap-id" -- ID of the case in CAP system.
2: "casename" -- short form of the case name
3: "cite" -- reporter citation (typically US Reports)
4: "date" -- year the case was decided
5: "courtname" -- Court name, in CAP. This should be "U.S." for US Supreme Court, but some records were misfiled within the CAP data and have something else here. These were manually checked for actually being US Supreme Court records, however.
6: "courtslug" -- CAP "slug" representing this court. Typically "us" but there are a handful of variations.
7: "numopins" -- Number of opinions in CAP in this case, with counting beginning at 1. CAP's detection routines get this right a lot, but there are definitely exceptions where the actual opinion count in the case, as measured by a human observer, would be different.
8: "opintype" -- The type of opinion, as determined by CAP. Generally right, with some allowance for errors, as mentioned in the other fields.
9: "opinnum" -- The number of the particular opinion in this case from which this match was drawn. 1-based counting.
10: "casematch" -- The sequential number of this match for the case as a whole, numbered from 1.
11: "opinmatch" -- The sequential number of this match for this opinion only, numbered from 1.
12: "before" -- the string of words prior to the matching word; in this data, six words. (lowercase)
13: "term" -- the term itself, here, it is always "our"
14: "after" -- the string of six words following the term (lowercase)
15: "scdb-id" -- the SCDB identification number of this case, matched using the CAP-SCDB match described above.
"CURRENT-our-pos-cap-scdb" -- a TSV file very similar to the KWIC results file described above, with the same header and field structure, and the same results from a case perspective. The difference is that the text in fields 12, 13, and 14 was tagged with parts of speech (POS) using the Perl Lingua::EN::Tagger library, v0.28, by Aaron Coburn. The window was lengthened to seven words on each side of "our" and then tags were applied, but since the tagger also tags punctuation separately in many cases, sometimes more than seven term/TAG "words" exist in fields 12 and 14. A complete list of the tags supported by the tagger and their grammatical meanings can be found at: https://metacpan.org/source/ACOBURN/Lingua-EN-Tagger-0.30/README
"RESULTS-our-kwic-followers-opinauth-chief_071520.tsv" further extends the results contained in the files above, by isolating the noun phrase following "our" using the grammatical tags above. These noun phrases were individually categorized by our legal historian as constitutive of "culture" or "process" (or falling into an ambiguous category). (See Tanenhaus and Nystrom, listed above.) The data was further augmented by applying the opinion author's name and SCDB author ID number from the corrected opinion authorship information, available separately as Nystrom and Tanenhaus (cite above). The Chief Justice information was also added, from SCDB.
"our-casecount-by-year_normalized.tsv" -- a TSV file containing 4 columns and no header. Column 1 is the year, column 2 is the number of individual cases (not opinions) decided in that year that contained the word "our," column 3 is the total number of cases decided in that year, and column 4 is the percentage of column 3 represented by column 2 (i.e. percent of cases in a year containing "our"). Note that number of cases per year is determined from SCDB, so any minor actions such as denial of cert not included in SCDB would not be included here either.
|All versions||This version|
|Data volume||279.4 MB||219.1 MB|