Subject ▸ data science

Nested structures

One of the most challenging things in dealing with data is Nested Structures. In a perfect world, data would be tables (rectangular) and be tidy. If physicists are finding the right format, they should also work for petroleum engineering folks.

The image is a screenshot of Fig.1 from the paper [“Machine Learning in High Energy Physics Community White Paper”] (http://arxiv.org/abs/1807.02876)

Link to post in Linkedin

The cloud and the sarcam of AWS easy as pie

Read from an article yesterday how learning the Cloud is a must if you are in data science and machine learning. I made some annotations: Although the article is a discussion on DevOps, there are parts that go beyond and touch data science and machine learning interests. That was just in case you thought you had it hard learning Python or R. The article is named How To Become a DevOps Engineer In Six Months or Less.

Read More…

How can a Petroleum Engineer kick start with Data Science?

I would start by identifying acute problems in your area of expertise (domain): production , reservoir, drilling, completions, geophysics, chemistry, seismic, geophysics, etc., that you feel could be resolved by applying data science. They may be big problems or small ones. Start with the small ones, or break the big ones in manageable pieces that you can address one step at a time. Once you have two or three data science “project” candidates, start applying the basics to solve the problem.

Read More…

Data Science for Petroleum Engineering: How does someone become good at Deep Learning?

I watched few days ago the interview from professor Andrew Ng to one of the luminaries of deep learning and artificial intelligence, Dr. Youshua Bengio. He has written books and dozens of papers on deep learning and neural networks. I liked the style. Pretty down to earth stuff. Just the way professor Andrew likes to do: bringing machine learning, deep learning to the masses. So the question remains: do petroleum engineers need to learn data science, computer science, statistics, machine learning, neural networks, virtualization and GPU based engineering?

Read More…

Data Science for Petroleum Engineering. Part 6 - Tidy data

One of the first concepts that one learns when working with data is rearranging raw data into tidy datasets. A tidy dataset not only means having the data in a row-column format but in such a way that a row corresponds to an observation and a column to a variable. This facilitates enormously the analysis. I know this could sound a little bit confusing, so I will show what raw data and tidy data looks like with an example.

Read More…

Transforming Petroleum Engineers in Data Science Wizards

Once in a while I get messages from colleagues asking for tips on Data Science applied to Petroleum Engineering. This is stuff I have collected over time (responses), advice to follow to become a Petroleum Engineer and Data Science wizard: Complete any of the Python or R online courses on Data Science. My favorites are the ones from Johns Hopkins in Coursera, complementing with DataCamp short workshops. Just two that come quick to my mind.

Read More…

Data Science for Petroleum Engineering - Part 5.3 Finding and filling missing well data in alphanumerics

This is what we will reviewing in this lecture. NOTE. You can find the PDF version of the R markdown notebook in GitHub at this link. The reproducible R markdown notebook itself is here. Both are full versions of this LinkedIn article. For the time being, LinkedIn publishing does not support markdown which would make sharing scientific and engineering documents much easier. Load the raw data file # code We will see that some well names can be fixed manually and others should be done automatically with a script.

Read More…

Data Science for Petroleum Engineering - Part 5.2: Finding and filling missing data

NOTE. You can find the PDF version of the R markdown notebook in GitHub at this link. The reproducible R markdown notebook (.Rmd) itself is here. Both are full versions of this LinkedIn article. For the time being, LinkedIn publishing does not support markdown which would make sharing scientific and engineering documents much easier. Mistyped data One of the challenges in cleaning up well data is having uniform and standard well names.

Read More…

Data Science for Petroleum Engineering - Part 5: "Transforming Excel well raw data into datasets.​"

One of the big challenges of this new era of data science. machine learning and artificial intelligence is getting unhooked from the habit of working with spreadsheets. They have been around for 30+ years and were awesome. But spreadsheets - or worksheets - do not scale well with massive amounts of data; or continuous streams of data; or other characteristics that are key for taking good and sound decisions such as reproducibility.

Read More…

Data Science for Petroleum Engineering - Part 5.1: Data Introspection with R

NOTE. You can find the PDF version of the R markdown notebook in GitHub at this link. The reproducible R markdown notebook (.Rmd) itself is here. Both are full versions of this LinkedIn article. For the time being, LinkedIn publishing does not support markdown which would make sharing scientific and engineering documents much easier. Transforming Excel well raw data into datasets This section is about getting familiar with our data.

Read More…