Is there a clash between Data-driven modeling vs Physics-based modeling?

Alfonso R. Reyes
(6 November 2019)

Article in progress. Leave your comments for debate. Will try to integrate later in the main body.

The more I learn on machine learning algorithms, putting in practice advanced applications of neural networks, deep learning convolutional networks, generative adversarial networks, recurrent neural networks and the like, the more I find similarities between these data-driven models and physics modeling.

My view is that (i) physics-based models have stand the test of time (centuries) and still will; (ii) data-driven models successes have been hyped because of the novelty of new algorithms and faster, ubiquitous computer power; (iii) some of the data-driven “everything” wave has been put forward with commercial interest in mind; (iv) the successes of data-driven models have been caused by a profound gap in physics-based modeling applications and software, -the “data wave” almost totally drowned to death to the physics-based modeling world; (v) physics-based modeling software neglected the effect of the continuous stream of data and chose to stay in the comfortable paradigm of charging for licenses; (vi) commercial physics-based modeling software underestimated the value brought by statistics, data science and machine learning; (vii) traditional physics-based modeling software companies have started to react but still don’t get it, -they have preferred to rename their products to *“any-word-here” + “intelligent”* to transmit their clients they have caught with the times of “artificial intelligence”.

Some background

I have been a long time fan of understanding and learning physics by running computational physics simulations, a mathematical and graphical way of replicating natural phenomena in the comfort of a lab. Two of my favorite tools for running computational physics models have been two open source libraries, Open Source Physics (OSP) and Easy Java Simulations (EJS). Both enjoy very solid Java libraries for solving ordinary, partial differential equations (DE), and a series of DE solvers, as well as real-time graphical libraries, still unbeatable in its simplicity and effectiveness in showing the -sometimes fast- evolution of simulations. OSP and EJS are relatively easy to install, manipulate, run and extend. The authors made a good job at abstracting the verbosity of Java in a fully conversing tool of math and physics.

No alt text provided for this image

But what does computational physics has to do with data-driven modeling? You may ask. A lot. As you will read in a minute.

Here are few issues that have been floating around that I would like to raise for discussion:

  1. *Data-driven* models are set to compete with *physics-based* modeling
  2. *Data-driven* models are better than *physics-based* models because the former are based on “abundant data”
  3. The success of *data-driven* models and machine learning algorithms make unnecessary to learn -or understand- *physics*
  4. *Data-driven* models will eventually replace physics models
  5. *Data-driven* modeling is another way of doing and explaining physics


It was going back to re-reading this excellent book on *how-to-develop-a-resevoir-simulator-from-scratch* that revived the feeling that how close physics and data are intermingled.

No alt text provided for this image

Reality check

Here are some of the findings that this article aims to bring up for discussion:

  1. Data is a multi-mode expression of natural phenomena; physics looks at understanding and explaining natural phenomena through mathematics
  2. Physics has provided sound and proven models of the natural world for the past two centuries by using just enough data
  3. Data-driven models using huge amounts of data are not discovering new laws of physics, they are just confirming them
  4. As data generating sources increase so physics-based models are pressured to build on more data that becomes available. More data means more models
  5. Many of the physics-based model applications have not been prepared to handle and deliver on huge and frequent amounts of data
  6. Data-driven models successes occur because they are filling the gaps that physics-based models cannot cover because of lack of bandwidth and neglected statistical capabilities
  7. Practically all modern algorithms for neural networks, deep learning in unsupervised machine learning settings, have been invented by statisticians, mathematicians and psychologists. The data-driven algorithmic land remains virgin to physicists and computational physics


“Computer Simulation Methods. Applications to Physical System” by Harvey Gould, Jan Tobochnik and Wolfgang Christian. Book. 2016.

“Data-Driven Modeling & Scientific Computation. Method for complex systems and big data” by Nathan Kutz. Book.

“Discovering governing equations from data: Sparse identification of nonlinear dynamical systems” by Steven L. Brunton, , Joshua L. Proctor, J. Nathan Kutz. Paper.

“Data-driven discovery of coordinates and governing equations” by Kathleen Champion, Bethany Lusch, J. Nathan Kutz, Steven L. Brunton. Paper. April 2019.

“Data-driven discovery of partial differential equations” by Samuel H. Rudy, Steven L. Brunton , Joshua L. Proctor , and J. Nathan Kutz. Paper. 2016.

“Physics-guided Machine Learning: Opportunities in Combining Physical Knowledge with Data Science for Weather and Climate Sciences” by Anuj Karpatne. Slides.

“Can We Practically Bring Physics-based Modeling Into Operational Analytics Tools?” by Jessica Granderson, Marco Bonvini, Mary Ann Piette, Janie Page, Guanjing Lin, R. Lily Hu. Paper. 2016.