The methodology used here borrows from that used by DP07 and Xiayun over at the now defunct World of KJ Yahoo Review thread. In a nutshell, they theorized and proved that the number of reviews on the Yahoo website on opening day could be used to predict the opening day Box Office number. My work here is similar, but looks at tweets from the days prior to release.
The methodology is as follows:
Record all tweets for a film from the day its release is announced to opening day.
Filter out garbage posts not related to the film or spam.
Obtain ratios for # of total tweets to Friday/weekend box office.
Simple enough right? Only problem is the sheer volume of tweets, the number of garbage tweets not related to the film (especially with a very generic movie title like "Fame"), and difficult to spell titles that lead to common typos. To get around this I have had to get creative with search filters and a lot of manual eyeballing of data to fine tune them.
Is it foolproof? No, obviously some people might be atrocious spellers and I might not search for all possible typos, or some people might reference the movie by the actors involved ("going to see the new Bale flick tonight."). But my hypothesis is that it will all even out per film. Twitter is by no means a representative sample of the population, BUT, it is a pretty consistent sample, and through following it over many weeks and months clear patterns should emerge once I keep in mind genre, appeal and the wider environment (ie. holidays, midnight screenings etc. that will affect tweet totals).
Since I first came up with the idea and began gathering data back in September of 2009, I have refined my tools and I now have access to positive and negative percentages for tweets, tweet data for every day of the week and from the day a movie is announced for wide release its tweets are tracked. As time goes on the formula expands and I have been able to incorporate more sophisticated methods of analysis, all which afford me greater accuracy and insight into buzz.
Two concepts which are core to our work are Ratios and Market Share:
The ratio is the number of tweets per $1 million of Friday Box Office gross. A film with 1,000 tweets and a $10 million Friday would therefore have a ratio of 100. In general, films that appeal to very young or older audiences have lower ratios since those audiences are not big users of Twitter. By comparison, films that appeal to younger audiences (18-35) have much higher ratios since those audiences are much more active users of Twitter. The tweets used for these ratio calculations are always all tweets from Monday to Thursday of the release week, or for Wednesday openers I use Monday to Tuesday.
Market share is calculated by looking at all tweets for all wide releases and wide release films in release for less than two weeks. Sometimes limited release films of note are included in the numbers if they might expand wide at a later date or are a release of note.
The main goal is to come up with a solid tweets to box office ratio by genre and audience for films. Why on earth is this important? Well, cause I'm a numbers geek and it interests me to combine my social media and Box Office passions. But it will also predict flops from further away and help to mine diamonds in the rough that will overperform. Looking forward to building a database of numbers for each film.