Text Simplification of College Admissions Instructions: A Professionally Simplified and Verified Corpus

Zachary Taylor; Junyi Jessy Li; Maximus Chu

This dataset contains three unique sets of texts. The first set of texts (ORIGINAL) is original college admissions instructions from the websites of colleges and universities in the United States. The second set of texts (SIMPLIFIED) is simplified college admissions instructions from the websites of colleges and universities in the United States. The third set of texts (ORIGINAL TO SIMPLIFIED ALIGNMENTS) are documents that pair--line by line--the original and simplified text to explore what information appears in the original that does or does not appear in the simplified version and how the simplified versions are altered as a result of the simplification process.

The texts, written in English (from US institutions), were manually simplified by an author of this paper who is a native English speaker. The author has a doctoral degree in education and has worked professionally in US postsecondary education for over a decade, including work in undergraduate admissions. Thus the author engaged with their professional insight to simplify without losing critical information necessary for its comprehension and understanding.

To determine whether the simplification of admissions application instructions were acceptable—that is to say they did not lose critical information or accuracy between the pre- and post-simplification process—we engaged with ten subject-matter experts (SMEs). Each simplified text was verified by 2 SMEs independently; in total, we engaged with 10 SMEs, who volunteered their time.

All ten of the SMEs had professional backgrounds in U.S. postsecondary admissions, having worked at least five years full-time in college admissions offices in the United States. These SMEs were identified through professional networks and snowball methods, as several of our SMEs knew colleagues from different institutions or educational entities who would serve as high-quality, knowledgable SMEs.

Moreover, we engaged with a diverse group of SMEs from different institution types (i.e., community colleges, public four-year institutions, private liberal arts colleges) and with various lengths of experience to capture the potential variability of admissions and financial aid parlance, jargon, and communication style. As the first study of its kind, identifying SMEs from diverse backgrounds provided more generalizability and reliability of findings, thus informing future research and practice regarding the communication of admissions application instructions to students and their support networks. Four subject-matter experts worked at public, four-year universities, four worked at private, four-year universities, and two worked at public, two-year community colleges.

To perform the acceptability judgement, the SME was presented with both pre- and post-simplification texts in real time over a Zoom video conference meeting. Then, we asked the SME to read the pre-simplified (original) text, followed by the post-simplification (simplified) text and determine whether the simplified text was acceptable. For example, changing the verb “submit” to “complete” is not acceptable because “submit” implies the documentation or information is being submitted by a submitter to a submittee, while “complete” only implies the documentation or information is completed and not directed to any educational stakeholder. If a simplification was deemed unacceptable by one or more SMEs, we asked the SME what simplification would be acceptable through an iterative process in real time across all texts in this study. Once the SME provided their feedback and we integrated their feedback into the simplified text, the same SME was again asked to read both the both pre- and post-simplification texts in real time and render their acceptability judgement. If at any time there was an instance where a lexical item (e.g., single word, acronym, initialism, compound adjective), sentence, or paragraph could not be simplified, the pre-simplified section of that text was used.

