Blog

A data science blog for Petroleum Engineering.

Topics covered

Subject ▸ data science

Reading wells from SPE data repository

8 March 2021

SPE, datasets, data science, petroleum engineering

Okay. There are two ways of downloading the data for all the wells in the SPE repository: the manual way (one file at a time with “Save As”), and the non-interactive automated way. The manual way is the easiest and require that you provide your SPE username and password in your usual login page. Then, you click on the link to the repository https://www.spe.org/datasets/, and start right-clicking on each of the files under the data folders.

A Vagrant virtual machine that runs a Shiny server

20 January 2021

data science, Shiny

Introduction This is an example of a straight forward generation of a Vagrant virtual machine. The script necessary to create the VM is written inside the Vagrantfile and has very few lines. The machines was upgraded to Ubuntu xenial64, as well the R Shiny server and the xenial keys to the repository. There are several files that document the changes and problems found during the rebuilt of this machine: README, NEWS, BUILD, and HISTORY, all of them markdown files.

A Vagrant virtual machine to run data science on Volve datasets

18 January 2021

Volve, data science

vagrant-volve-navarro-BI64G20S2JP8201 This is reproducible work of Machine Learning and Data Science applied to data from the Volve field. Features This is a VirtualBox Virtual Machine (VM) that is automatically generated using Vagrant. A few Machine Learning and Deep Learning packages have been installed, such as Scikit-Learn, NLTK, Keras, TensorFlow and Theano. A Vagrant file is used to generate this VM based on Ubuntu 18.04 (bionic64). Additional packages required for this phase of the ML and DS work are welly, pandas, numpy, seaborn, and lasio.

A compilation of Machine Learning examples

21 November 2020

machine learning, data science

Introduction This is a compilation of machine learning examples that I found. They are easy to understand, they address a fundamental principle, they explain why they chose a particular algorithm. Some of them you will find very detailed; others are short and straight to the point. Prerequisites I used R-3.6.3 and RStudio Preview 1.4. I also plan to use Anaconda, Miniconda and GNU Python for the parts where I make use of Python code.

Docker for R - Minimal book

21 November 2020

virtualization, computer science, data science

This is a minimal example of a book based on R Markdown and bookdown (https://github.com/rstudio/bookdown). Please see the page “Get Started” at https://bookdown.org/home/about/ for how to compile this example.

How deep should I go in learning data science, machine learning and computer science?

18 November 2020

data science, machine learning

How deep should I go in learning data science, machine learning and computer science? First of all, of course, we cannot lose focus of [petroleum] engineering. That’s what makes us “the Domain Experts” or SMEs. But this new industrial revolution based on data we are living in, requires a new set of lenses to understand, and discover things that were not so evident few years ago. I am not saying you should turn in a professional programmer, and neither to be an amateur; it is about learning the bare minimum to be able to state a problem, describe the workflow, discuss with DS and ML experts, and work out a prototype before scaling it.

rProsper adds batch automation, dataframe generation

15 September 2020

data science, petroleum engineering, Prosper

For those working in #oilandgas #productionoptimization this will make a good addition to your #datascience and #machinelearning toolkit. I’m working on the last details of new #rstats package rProsper. rProsper adds batch automation, dataframe generation, customized ggplot2 plotting, and powerful statistical analysis to the daily well modeling workflow. It makes production optimization faster and more reliable where well models are not treated as isolated units (one model, one file) but part of a statistical worldview (one well, one row).

Building the book dataviz-wilke with Docker

13 July 2020

reproducibility, data science, visualization of data

dataviz-wilke 2020 This book “Fundamentals of Data Visualization` by Claus Wilke has been made fully reproducible using a Docker container. The compiled book can be read online here. The original repository of the book is in GiHub at this link, and can also be read online here. The book is great at learning advanced visualization techniques using R without focusing too much on the code but rather on universal, timeless best practices.

rOpenserver

3 July 2020

data science

rOpenserver package The goal of rOpenserver is providing an R interface to Petex (Petroleum Experts) applications Prosper, GAP and MBAL to perform automated tasks, generate datasets for statistical analysis, and advanced fine control of solvers and calculations. Installation rOpenserver is not in CRAN yet but in the meantime, you can install it from GtiHub using the devtools package, from the rOpenserver repository with: install.packages("devtools") devtools::install_github("f0nzie/rOpenserver", dependencies = TRUE) The argument dependencies has the role of downloading and installing packages that are key for rOpenserver, such as R6 and RDCOMClient.

Building a gallery of TikZ graphics with R

1 June 2020

data science, reproducibility, literate programming

Introduction I have been working a lot lately with LaTeX and TikZ graphics. I am preparing a paper and few articles that need some sketches and wasn’t able to find the right tool to do it. I am a markdown guy and almost everything I write is either in Markdown or Rmarkdown files. One of the last documents I wrote using Rmardown was the transcript of the interview to Professor John Hopfield.

All Blog posts by date