D3.1 GEP Prevalence Monitoring Indicator framework v1
Description
A reliable method for systematically monitoring Gender Equality Plans (GEPs) at a supra-institutional level is unavailable. This is the reason why this report summarises the development of a methodology to monitor GEP across Europe in two different ways to figure out which could be the most appropriate in the future or how to combine both methods. The first methodology is non-reactive (web scraping and automated text analysis), and the second methodology is a European-wide online survey (reactive method). At the end of the report, we compare both methodologies in terms of their advantages and disadvantages for the European-wide monitoring of GEP.
In order to develop the methodologies, a pilot study has been conducted to test the methods on a smaller scale. The pilot study sample encompasses 83 organisations selected from Germany, Greece, Ireland, and Estonia.
Firstly, the report outlines the theoretical basis of the INSPIRE indicators and their development process for monitoring purposes. The INSPIRE indicators were created based on the T.2.1 Data Monitoring Report, and feedback was received from four different focus groups, which were conducted with a total of 28 participants from all over Europe. The INSPIRE indicators encompass the four areas regarding the prevalence, characteristics, implementation, and impact. While developing INSPIRE indicators, we also consider intersectional and inclusive perspectives.
Second, the nonreactive methods and the survey methodology for data collection will be explained. On the one hand, we have combined various non-reactive methods. The web scraping tool SerpAPI was used for data collection and specified Google’s crawled database to build a specific INSPIRE scraper. The INSPIRE web scraper detects more than the targeted GEPs and downloads also unspecified PDFs. Therefore, an intermediary classification process is required to clean up the data corpus using Large Language Models (LLM). On the other hand, a more conventional online survey was sent to Research Performing Organisations (RPOs) via the online survey platform UNIPARK for data collection.
Both approaches demonstrate advantages over one another. Online surveys offer the advantage of obtaining high-quality targeted data, enabling many observations and facilitating efficient data collection. However, online surveys face a low response rate and difficulties in acquiring participants' email addresses. Indeed, the pilot study results show that the INSPIRE pilot survey has a low completion rate and is hampered by issues such as reaching private RPOs. Nevertheless, it provides high-quality, targeted information. To improve the survey methodology for monitoring GEP, having more knowledge or evene a database about contact persons would be worthwhile. In contrast, web scraping offers excellent potential for collecting massive amounts of data without needing much contact details. However, collecting data via web scraping faces challenges such as selecting appropriate tools for scraping and developing the algorithm to collect the data. Web scraping offers extensive data collection capabilities, but sorting and selecting the data poses considerable challenges.
First pilot study results show that the INSPIRE’s scraper has more success in capturing the prevalence of GEP in comparison to the online survey. However, capturing information regarding the characteristics, implementation, and impact of GEP requires more sophisticated approaches, which the online survey can deliver. The biggest challenges for non-reactive methods are to deal with translation issues for data collectiona and analysis, inconsistencies in metadata, the need for high computational capacity and the absence of standardised terminology for file descriptions.
Files
INSPIRE_D3.1 GEP Prevalence Monitoring Indicator Framework v1.pdf
Files
(3.0 MB)
Name | Size | Download all |
---|---|---|
md5:d6462cad9157ae94df376e4c27923d8a
|
3.0 MB | Preview Download |