Analyzing Gender Representation in Movie Scripts

By Miranda Speyer-Larsen, Junior Researcher

Movies infiltrate our subconscious and shape the way we see the world. This is also because we assume that we base our movies on reality, at least somewhat. Sure, we might know that fiction and fact are not the same, but how aware are we really of every writer’s own worldview, subconscious stereotypes, and biases?

The way we as a society view different demographics is a never-ending cycle. Parents pass on their views and biases to their children. These children grow up to make movies through these views, rarely questioning where they came from. More children see movies like this, internalize the ideas, and will grow up to teach their children the same thing. And so, it continues.

This is why accurate representation of different demographics in movies is so crucial. Little girls shouldn’t grow up seeing movie after movie where women are clueless, emotional, one-dimensional supporting roles, and subconsciously learning that’s all women can be. Little boys should see strong male characters that are able and allowed to show emotion without compromising their masculinity.

How this project started

A significant part of what motivated me stems from my frustration with stereotypes about women. That we talk too much, are too emotional, and, according to a lot of movies, never know what to do in a crisis situation.

Reese Witherspoon perfectly explained this last point in her 2015 Woman of the Year Award speech (https://speakola.com/ideas/reese-witherspoon-glamour-woman-of-the-year-2015), where she said, “I dread reading scripts that have no women involved in their creation because inevitably I get to that part where the girl turns to the guy, and she says, “What do we do now?!” Do you know any woman in any crisis situation who has absolutely no idea what to do?”, and she’s absolutely right. We’re perpetuating the idea that women are helpless damsels, with zero problem-solving capabilities.

This has always made me incredibly angry. Some might call this an overreaction, because it’s just movies, and they aren’t real, so they can’t hurt me. But these ideas are everywhere, infiltrating every aspect of the lives of every woman on the planet.

Another important part of my motivation, as well as the project itself, is the Bechdel test. If you’re not familiar, the test is passed if a movie has 1) at least two named female characters, 2) who talk to each other, 3) about something besides a man. This test has always stuck with me, because it’s so simple and on the surface seems like the bare minimum requirements for a movie, if it’s trying to portray women in a nuanced matter. While not a comprehensive indicator of representation, it highlights whether women play an active role in a film. Astonishingly, many movies fail. Of course, not every movie has to pass the test, just like not every movie needs two named men talking about something other than a woman, but it is astonishing how many movies don’t pass (https://bechdeltest.com/).

Part of this could be attributed to a phenomenon that Caroline Criado Perez refers to as “The Default Male” (https://carolinecriadoperez.com/book/invisible-women/). Here, she talks about the “male-unless-otherwise-indicated” approach to research, how people view stuffed animals as male until they are “hyper-feminized”, and how the generic masculine remains pervasive in gender-inflicted languages. Male is always the default, so of course all characters in movies are male, unless we make them female. This means that female characters and their stories are “niche”, and therefore not as necessary or universal as the male characters.

All of this has made me wonder deeply about gender dynamics and representation in movies.

Narrowing the scope

The main idea of the project is to use programming to analyze gender in movie scripts. This includes analyzing how much speaking time different genders get, what emotions they are expressing, as well as an automated Bechdel test.

One of the first challenges of this project was defining the scope. Gender representation in movies is such a huge topic, with a lot of questions that need answering in the beginning. For example, are we making sure to include and consider all genders, and if so, how do we account for non-binary characters? Which movies are being considered, is it specific to genre, specific decades, or something else?

Through a lot of consideration, messy brainstorms, and sparring with the other Junior Researchers and lab members, I was able to narrow it down. I chose the website Simply Scripts (https://www.simplyscripts.com/), for its very comprehensive list of movie scripts in English sorted in useful ways, such as genre or as a timeline.

Finding out how I could remain gender-inclusive was a challenge. My first thought was categorizing characters into male, female, and other based on their name, which is possible through multiple resources. However, names generally might not reflect gender that well. For example, there are unisex names, and non-binary people can have names that are typically regarded as binary.

After a lot of consideration, I decided the best way to encapsulate the complexity of gender within the scope would be through pronouns. Simply analyzing what pronouns each character is using, and assume it aligns with their gender identity. For cases where this is not possible – for example, if the pronouns are never specified – the only way forward is reviewing each case manually.

Choosing which genres to include in the analysis also posed a challenge, as genres are often seen as very gendered. For example, some people argued that historical movies shouldn’t be considered because “of course those are going to center men.” While this reflects historical realities to some extent, it also overlooks the roles women have played in those settings, often reduced to background characters or sentimental props, like photographs in soldiers’ lockets, rather than active participants in the narrative.

I also faced difficulty defining genres themselves. Data on genre preferences between men and women shows some overlap – both tend to favor the same top genres – but distinctions blur (https://www.statista.com/statistics/254115/favorite-movie-genres-in-the-us/). For example, romcoms are often separated from comedies in statistics, despite being comedies. If women’s preferences lean toward romcoms, would that skew a general comedy statistic? These grey areas make it harder to draw definitive conclusions about gendered genre preferences.

In the end, I chose to focus on what scripts were readily available. This initial pool included a variety of genres, but some – like “documentary” and “suspense” – only had a total of four scripts, making them impractical for meaningful analysis. The genres with enough data to include were action, adventure, comedy, crime, drama, horror, mystery, romance, sci-fi, thriller, and war. Interestingly, this set reflects a good mix of what men and women tend to watch. I decided to include war films, despite their male-centric reputation, because I think it’s important to examine whether women are given a role in these narratives at all. Representation of women’s contributions in these stories matters just as much as in any other genre.

By refining the scope of the data through these considerations, I was able to establish a solid foundation for analyzing gender representation in movies.

Getting the data

Although Simply Scripts is well-organized and makes automatic extraction relatively easy, actually obtaining the scripts in a useful format turned out to be more challenging. The scripts are available in a variety of formats and file types, some of which aren’t suitable for data analysis, like PDFs made up of images of text. However, I was still able to adapt the code to handle many of these different file types, successfully extracting a lot of scripts.

The next step

Now that the data is collected, the next step is cleaning and making sure everything is in a usable format. When this is done, the actual analysis will begin, which will involve a variety of natural language processing techniques and language models, that can help extract meaningful insight from the text.

Analyzing Gender Representation in Movie Scripts

How this project started

Narrowing the scope

Getting the data

The next step

Share this: