5 Reasons Why It's Hard Working in the Data Salt Mines of Business Intelligence

It's not hard to make a picture from a model (it may be expensive). But it's hard to extract a model from a picture. From structured to flat data is easy the inverse is not.

But at the moment we get the impression massive data is the most valuable asset?

Take "Business Intelligence" (or "Trading Intelligence"). IMO, business intelligence is about theories, methodologies and technologies to create meaningful information for better business decisions - like finding the best position on the value / price map of your products.

Approaches can be model, or rule-based or data driven.

Life in The Data Salt Mines

But most of the definitions say something, like: a set of methods and tool to extract meaningful information, such as (understandable and hopefully computational) "models" .. from (raw) data.

That means models are not results of thinking but from data mining techniques such as statistics, fuzzy logic based learning (fuzzy decision tree, fuzzy rule based learning … ), neural networks, kernel methods (like supported vector machines), self organizing maps and what have you. (Why fuzzy? Because it makes the decisions trees and rule bases understandable and computational).

That does not sound like hard work, does it?

Challenges? I select 5.

1. What truth does your data set represent?

A set of data is true if it describes a real behavior without ambiguity. You have perfect data records of the price dynamic of one market segment related to your product class, but you want to approach a new market segment with an extra service?

2. What are raw data?

It's like in the kitchen: chefs do not only use raw ingredients but semifinished things like stocks, sauces, …
Many of the data are "cooked" (a result of a process) - cost, expenses, …are dependent of accounting standards, … In fact only very few data are raw.

3. Do Big Data present more than noise?

Usually big data represent objects in high dimensional parameter spaces and it is practically impossible to capture each possible data point. So you may run into the high-dimension-low-number-of-sampels problem - in those sets large deviations are more attributable to noise - see The Big Joke of Big Data.

4. Does machine learning generalize?

If you have data of partitions that you are interested in and your data present the truth in this partition you just extract different models of their data sets. No problem. But the data collection might be expensive. Therefore you want to extract models that generalize, but machine learning is of bad nature for generalization. See Should quants learn more about deep learning?

5. Are there unified best practices for business decisions?

IMO, you shall not extract best practice from general business patterns but the deep understanding what's going on in your environment. You may need to change strategies and the positioning of your product quite often.

As a result, I want to point out: don't build your business intelligence system by data driven methods only. Use an intelligent mix of model based systems and calibrate them to informative data.

What about trading intelligence? 

In fact, it is my belief that every challenge above is true for high frequency trading, in particular the branch that hunts for patterns in market data.

What about information efficiency of prices? What about risk spectra at various time scales? What about total trading cost? …..

I am not so bad in understanding machine learning, but I don't want to answer these questions - so, I am not a candidate for a job in the data salt mines.

Picture from sehfelder