Conference paper Open Access
Rokem, Ariel; Aragon, Cecilia; Arendt, Anthony; Fiore-Gartland, Brittany; Hazelton, Bryna; Hellerstein, Joseph; Herman, Bernease; Howe, Bill; Lazowska, Ed; Parker, Micaela; Staneva, Valentina; Stone, Sarah; Tanweer, Anissa; Vanderplas, Jacob
During the Summer of 2015, the University of Washington eScience Institute ran an interdisciplinary summer internship program focused on urban informatics, civic engagement, and data-intensive social science. Borrowing elements from the successful Data Science for Social Good (DSSG) programs at the University of Chicago and Georgia Tech, and building on our own previous consulting and "incubation" programs for data-intensive projects in physical, life, and social sciences, we brought together teams of students (graduate, undergraduate, and high school), data scientists, project leads and stakeholders from the University of Washington and local NGOs to design, develop, and deploy new solutions to high-impact problems in the Seattle Metro Area.
In this paper, we describe the inaugural offering of the eScience DSSG and reflect on the process of organizing and structuring the program. The DSSG attracted 144 graduate and undergraduate student applicants from over 10 different fields of study. The final DSSG fellow cohort included 16 students accepted from this pool of applicants. In addition, we included six high school students who joined us from a separate program designed to expose young people to research activities and an undergraduate student who had already started working on one of the projects through another summer research program. We solicited project proposals from research professionals across academic, non-profit, and government institutions. Ultimately, 4 projects were chosen out of 11 submitted proposals: two addressing transportation access for people with limited mobility, one identifying factors affecting whether homeless families find permanent housing, and one deriving new metrics of community well-being from social media data and other relevant data sources.
All datasets were sourced from Seattle businesses, foundations, and agencies, with the exception of social media. The teams worked in a shared studio space designed in part for this purpose, and participated in tutorials on relevant tools and technologies, such as GitHub, Python, R, Amazon Web Services, and SQL, as well as topical presentations and discussions related to social good and multi-stakeholder collaborations.
We found that striking a balance between training and software "flow time" is essential, and that determining the right balance between structured and unstructured activities is delicate. The diversity in software and disciplinary experience among participants was initially challenging for tutorial organization and scoping projects. A mix of advanced and introductory material meant that some participants were either lost or bored at any given time. But in the end, this diversity actually helped to improve the scope of the projects. For example, GIS experts added mapping components, software engineering experts designed APIs, and domain experts sanity-checked findings. Overall, the enormous interest implied by the number and diversity of the applicants to our program suggests that similar programs could be operated in many other cities. We intend this paper to facilitate reuse and optimization of the key components of our program.