I got the same question over on our (TouringPlan's) forums. Here's what I wrote there:
A few points:
1) Disney doesn't share its hotel occupancy numbers with us. We get some of that anyway, because we have contacts who are sympathetic to what we're trying to do. But it's not comprehensive. (And it probably doesn't matter - see below.)
2) The research team at VisitOrlando.com also puts out
hotel occupancy data.
3) As far as we can tell, hotel occupancy is not a great predictor of how long you're going to wait in line.
I'll explain that last thing a bit more.
Every time we collect a wait time (say, "20 minutes at Space Mountain"), we attach to that wait time, a couple hundred other pieces of data. The date and time are two obvious things that go along with the wait time:
- The wait time at Space Mountain is 20 minutes at 9:14 a.m. on Monday, January 9, 2017
We also keep track of things like whether it's an EMH morning or afternoon, the EMH and special event schedule across every park, and what the EMH/event schedule was at each park over the last few days and the next few days.
We also know about school schedules, holidays, the weather today, yesterday, and tomorrow, and so on. We even track the state of the U.S., Canadian, British, and Brazilian economies over the last few months.
Like I said, literally hundreds of other pieces of data - anything that you might reasonably think of as affecting wait times, we've probably tried.
We've tried looking at hotel occupancy in our models. It never comes up as one of the top 20 or so predictors of a good general model for wait times.
Conventions, for one thing, tend to increase resort occupancy without affecting wait times much. For example, Primerica has two 5,000-person WDW conferences coming up from 1/16-1/19 and 1/23-1/26. But except for two days where they're renting out DHS in the evening, those folks are going to be stuck in meetings all day. For the most part, they're not going to be standing in line at Space Mountain at noon.
Finally, the methods we've tested when creating these models include:
- regression
- support vector machines
- decision trees
- random forests
- extremely randomized trees
- k-neighbor
- gaussian process classifiers
- naive bayes classifiers
- XGboost's boosted trees
- "ensemble" methods that are collections of the above (including low-correlation ensemble methods)
- "stacked" methods where the output of one of the above is the input into another of the above
- Google's TensorFlow deep learning neural network software
We could be wrong about 2/23 - there might be something we've overlooked or not updated. But having looked at the crowd cal prediction process in detail for the last few months, I'm confident it's the best method anyone has come up with.