In our group, we have been interested in the question of how the structure of a webpage influences its performance on the web. It is (in my opinion) one of the key questions at the heart of distributed web application delivery. Thanks to amazing resources like HTTP Archive and BigQueri.es, it is pretty straight forward to access and play with large scale web performance data (measured twice a month across 400,000+ websites and made available for free!).
While Pearson correlation is quite handy when its assumptions are satisfied, it is not uncommon for real-world datasets to break these assumptions. It will be worth checking if Pearson correlation is a suitable measure for HTTP Archive data.