The trend for Big Data is evident. We realize how sexy data analytics has become, Data Scientists are the new rock stars and it is expected that they create the next internet gold rush.
I have published some thoughts about this here, after I have read the Wired article, The Exabyte Revolution.
Walking through job descriptions in job offers it sometimes reads like be-passionate-about-working-with-massive-data. But is this more about data manipulation, visualization, statistics, machine learning? Or massive parallelism, programming in scripting languages ...?
IMO, data scientists shall apply multi-strategy and multi-method machine learning to extract knowledge, meaning, models, strategies, ... from data.
Take predictive modeling (create models that have forecasting capabilities - predict the probability of an outcome) that helps to create the right financial trading strategies, estimation of properties, like volatility, predict behaviors of market participants ...
In machine learning (especially supervised learning) you have 3 principle tasks, related to the data and knowledge you have:
If you know nothing, you extract models from data by analyzing I/O relations - in one learning step (but with cross-model iterations).
If you have models thats approximation of the I/O relation is not sufficient you extract models that analyze I/deltaO relations (the delta between reality and your model results) - in two steps: application of the models and a learning step.
If you have good parametric models, you want to extract models that analyze the I/ParameterSet/O relation creating parameter sets for the best fit. In the general case this might need multi-steps.
In valuation of financial instruments this is done by calibration engines relying on certain sets of market data - do do this correctly a deep knowledge in inverse problems is required.
In all of these tasks, you want to get insight and consequently want models that are white boxes and computational (examples: a rule base its white box but not computational, a fuzzy rule base is both, ANNs are computational, but not really white boxes, ...).
Summarizing, it is my strong believe that financial modelers and data scientists need to collaborate.
And I hope much more data will be made available in an internet-of-finance in the future - for better valuation and risk management.