Exploring Gender Inequalities in the Digital World through Computational Sociolinguistics – EGICS
According to an internet adage, “there are no girls on the internet”. As provocative as it may seem, there is more than a grain of truth in it. Far from being the “great equalizer” it was hoped to be in the early 1990s (Wojahn, 1994), i. e. a space where there would be less sexism and racism than in the real world, the internet reproduces and oftentimes magnifies gender inequalities, and we live in a global context of “digital segregation” (Friesen et al., 2021). The EGICS (Exploring Gender Inequalities in the Digital World through Computational Sociolinguistics) project aims at uncovering gender participation patterns on various internet platforms (Reddit, YouTube, social media, online forums and comment sections of media sites, etc.). It uses computational sociolinguistic methods, which combine natural language processing and questions about the relationship between language and society. Because of how central language is in online interactions, we argue that this approach has the potential to provide a better understanding of gender discrimination online. Based on large datasets, EGICS proposes to develop a fine-grained approach to online identities that takes into account not only gender (or other) identities, but also how those identities are linguistically performed in context.
EGICS is structured around three phases: PARTICIPATION, DE-CONSTRUCTION, and ACTION. PARTICIPATION will develop a multi-factorial approach to measuring the online participation of individuals of different social categories, taking into account a wider range of linguistically encoded information about gender (self-identification, grammatical gender marking, pronouns, first names etc.) than is usually collected. Moreover, it will explore how gender interacts with class, age, sexuality and (dis)ability. The goal of the second phase, DE-CONSTRUCTION, is to investigate how insights from discursive approaches to social categories can be useful to explore inequalities in online participation. It moves away from social identities to study sociolinguistic performances: how speakers use different linguistic resources to communicate information about themselves in different contexts. Clustering algorithms will be used to group speakers based on their linguistic choices, and the “sociolinguistic identities” will be compared to the social identities of phase 1 to see to what extent they align. Finally, the ACTION phase investigates how social and sociolinguistic identities are policed and supported online through a study of both misogynistic, trans/homophobic and racist language, and its anti-discrimination response.
EGICS is an ambitious project, because its very detailed linguistic analyses require more time to perform than is common in NLP. However, it promises a great reward: a better understanding of the source of the digital gender gap and directions to help close it. Innovative and interdisciplinary, it incorporates insights from NLP, sociolinguistics, gender studies and digital ethnography, and will make scholarly contributions to all these areas. Great care will be taken to create inclusive corpora , by scraping text from diverse platforms and communities. In this way, EGICS aims at reducing disparities in datasets: because of the way they are constructed (relying on websites’ APIs such as Reddit’s), large corpora of online data often marginalize women even further (D’Ignazio & Klein, 2023). I hypothesize that this new approach will yield a much more complete picture of the online participation of people of different gender (and other) identities, across a much wider range of online contexts, than currently exists.
Project coordination
Marie Flesch (Marie Flesch)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
LLF Marie Flesch
Help of the ANR 203,883 euros
Beginning and duration of the scientific project:
- 24 Months