Your overall point is good. It is easy to over interpret the accuracy of results coming from Datasets with n very large when in practice there comes a point where inaccuracies due to model assumptions and data quality swamp the variance die to sampling (big data take note).