The Glitz! The Glamour! The Oscars!
Once again, I must grapple with my addiction to organizing data and making pipelines to organize data. This time, my sights are set on The Oscars.
Now generally, I am anti-award show. They never give awards to who I think should get them or to the three things that I’ve seen in whatever year. Very occasionally I’ll have a hot take (like how they definitely should give John Williams another Oscar this year) but honestly, it’s been years since I’ve watched a ceremony.
…However, a couple years ago I had two questions on my mind that required some clean Academy Award data.
- Are highly nominated films with predominantly non-white casts ignored for acting nominations?
- What single project had the most Oscar winners take part?
The first question came largely from the discourse around Parasite, which despite getting six nominations, got zero acting nominations. I was probably keyed into this from the They Call Us Bruce podcast (an excellent listen on Asian America).
The second question came oddly enough from the film Rat Race, which I misremembered some old IMDb trivia for; it is actually notable for having two African-American Oscar winners in it.
Anyway, answering these questions led to a deep dive on Oscar data. The Official Database provides the names of the films and nominees, but not in any sort of way that is easily parsable, and also doesn’t differentiate between people with the same name. Did the same Steve McQueen get nominated for acting in 1966 and directing in 2013? (No.) This Kaggle Dataset is more easily parsable, but was out of date and still didn’t have unique ids.
Anyway, my Oscars dataset solves my problems and then some.
- It uses IMDb identifiers for movies, people, and even companies and nominations.
- It does fuzzy matching to match the proper ids with the films, which was a real pain.
- Each category is given a general class (i.e. Acting awards) and a canonical category, because winning for
MUSIC (Original Score)is roughly equivalent to
MUSIC (Music Score of a Dramatic Picture)
- Extraneous text is parsed out so we don’t end up with a ton of nominations for someone named “Written By”.
Returning to my original questions, the first question is a little tricky to answer without some database of nominee ethnicity, but I did determine all the films with “a lot” of nominations compared to the number of acting nominations they got. The films with no acting nominations but otherwise a lot of nominations were not just films with non-white casts like Parasite, Slumdog Millionaire, Life of Pi, Crouching Tiger Hidden Dragon and Black Panther. Also included were a bunch of “genre” films like Avatar, Return of the King and Mad Max Fury Road. The biggest exception I found when I originally looked into this was Lion, but now, to my great joy, Everything Everywhere All At Once is bucking the trend, with four acting nominations.
The other question required that I scrape all the other credits for films that had Oscar nominees involved, which led to a completely separate rabbit hole. Based on all the credits, the IMDb listing with the most nominees involved is…The Today Show (with 670). Most winners is Biography with 352. The problem is that IMDb counts every “self” appearance AND archive footage. The other skewing factor is musical credits, because otherwise you get The Simpsons having 169 Oscar winners working on it, by virtue of the fact that artists from Aaron Copland to Trent Reznor have had their songs as part of the show.
Eliminating those, you end up with a lot of long running T.V. anthologies with the most Oscars people. General Electric Theater and The Magical World of Disney both tie for having 102 winners involved.
If you limit it to strictly movies, then the single project with the most Oscar winners involved is Return of the Jedi! Turns out a lot of people who won visual effects awards worked on that little film. #2 is Empire and #3 is A New Hope. #4 is Titanic.
If you limit it to only actors, then you get the results I was actually originally looking for:
- Most Winners: 1992’s The Player (12 Oscar winners)
- Most Nominees: Also The Player (24 nominees) but #2 is Avengers Endgame with 19.
The lists of winners/nominees include people who won after the movie came out. The Player also wins for having the most past winners, but more interesting: tied for #4 is 2009’s Nine with 6 past winners. I’m honestly surprised it wasn’t marketed as such. There’s actually a lot of films of questionable quality that aren’t Oscar bait that have lots of past Oscar winners in them. Both 2012’s The Dark Knight Rises and 2008’s Four Christmases have 5 past winners in them.
What can we conclude? Messy datasets are a pain, especially when dealing with fuzzy string matching. Also, award shows are fairly meaningless, but can be a useful starting place for interesting conversations. And putting Academy Award Winning Actor in the trailer for a movie is no guarantee that it will be any good.
Now if you’ll excuse me, I’m going to go watch Everything Everywhere at Once once again.