The Big Joke of Big Data?

The risk managers of tomorrow need to have as much experience in understanding massive data (extracted from news?) as they currently have when interpreting results of quantitative methods?

But how can we worry about big date, when we still struggle with getting the right small data (market data) to identify the parameters of our models?


In Big Data Quants - Is News Analytics Another Form Of Riding the Price Waves I have seen big data as one factor in a multi-factor strategy and if, only in intelligent combination with quantitative methods.

But further reading has popped up a few additional questions. They are not about the access to such data or methods to slice and dice them, but the possibility to turn this data into something that supports decisions and actions of a risk manager.

In Beware the Big Errors of Big Data NN Taleb (in Wired Opinion) points out:
We're more fooled by noise than ever before, and it's because a nasty phenomenon called "big data". With big data, researchers have brought cherry-picking to an industry level.
And it is all about high-dimension-low-number-of-samples and that in those sets large deviations are more attributable to noise than to information.

We know from machine learning that the best methods for, say, 100 parameter and 1000 samples data sets are decision tree based. But they are more for getting some rough idea, maybe reduce the number of parameters, …  than extracting understandable and computational models.

If you just naively apply statistical methods you will most probably find significant correlations that are spurious.

This is the disadvantage of big data, the more variables the more spurious dependencies and the possible misinterpretation grows nonlinear with respect to the dimension.

What, IMO, say, decision tree methods could do: find variables and regions that show strong influence in separating data. And this may tell us something is wrong with an assumption, in a partition, ...

From machine learning experience we know: you need multi-strategy and multi-method approaches and cross-validate.

How difficult it may become: Life In The Data Salt Mines .. enjoy.