Paper research in Petroleum Engineering: Do you know how to spell Gas Lift?

Alfonso R. Reyes
(24 January 2018)



One of the little annoyances while doing paper research in OnePetro is knowing the correct spelling of the keywords under search. It would seem insignificant but we will see in this article choosing the right keyword could have effects on the results. Let’s see a practical example.

For this demonstration I will use the R package petro.One. It is available from CRAN, as free and open source project. The advantage of using petro.One is that you can do many operations in batch, perform search automation, and receive a table (dataframe) as a result.

We start by creating a list of all the possible combinations of the words “gas” and “lift”. These are the some that I have seen written in papers, presentations and articles. There may be more, of course. You can try yourself if you install and run petro.One in your computer.

In the code chunk below each of the keywords has been entered as one line of text. We enclosed that text with quotes and assign it to the object keywords. Then, we convert that text into a dataframe using the R function read.table, providing some arguments. For instance, we are not providing a header (header = FALSE), the separation between keywords is a new line (sep = “”), we don’t want factors but text (stringsAsFactors = FALSE), remove blanks between words (strip.white = TRUE), and the name of the column is keyword (col.names = “keyword”).

library(petro.One)

# provide the list of keywords
keywords <- "    
    gas lift
    gas-lift
    GasLift
    gas.lift
    gas_lift"

# convert the text to a dataframe
# read text table and split rows at carriage return
kw_tbl <- read.table(text = keywords, header = FALSE, sep = "\n", 
                     stringsAsFactors = FALSE, strip.white = TRUE, 
                     col.names = "keyword")

The result is this little table or dataframe below. I am using a dataframe and not a vector because I am expecting that in other cases we could have many more word combinations; let’s say, more than 20, or maybe above 100. And that’s better managed with a dataframe.

# show the dataframe
kw_tbl  
#>    keyword
#> 1 gas lift
#> 2 gas-lift
#> 3  GasLift
#> 4 gas.lift
#> 5 gas_lift

Now that we have all the word combinations stored in a table, we will iterate through all these keywords and send a query search to OnePetro for each of them. This means that that we are sending an automated search to the OnePetro website. Because we are very good internet citizens, we are also taking care of not sending too much traffic to the website by adding a pause of five seconds between searches.

Build iteration loop

# iterate through the keywords dataframe
rec <- vector("list")
i <- 1
for (k in kw_tbl$keyword) {
    url_all  <- make_search_url(query = k, how = "all")    # create search query
    count    <- get_papers_count(url_all)                  # paper count
    rec[[i]] <- list(keyword = k, count = count)           # add observation
    cat(sprintf("%30s %5d \n", k, count))                  # print it
    i <-  i + 1                                            # increment counter
    Sys.sleep(5)                          # do not bug OnePetro website too much
}                                         # be a good internet citizen
#>                       gas lift  7273 
#>                       gas-lift 17804 
#>                        GasLift   588 
#>                       gas.lift     2 
#>                       gas_lift     4

These are the results.

dt <- data.table::rbindlist(rec)                # final data table
dt
#>     keyword count
#> 1: gas lift  7273
#> 2: gas-lift 17804
#> 3:  GasLift   588
#> 4: gas.lift     2
#> 5: gas_lift     4

Observations

The keyword with most papers written is the word “gas”, followed by dash, followed by “lift”. There are 17804 papers which contain the word “gas-lift”. The second keyword is “gas lift” with 7273. That is the word “gas”, followed by a space, followed by “lift”. Both keywords account for little more than 97.69 % of the papers. There is a third with 588 papers: the keyword “GasLift”. It is not pretty common but I have seen it in some literature (Shell?, maybe). There are two more keywords with 6 paper results. Very marginal within the context of 25671 papers. The words “gas.lift” and “gas_lift”, probably typos.

Conclusions

With this short example we could appreciate the importance of considering the right keywords when performing paper research. We could do this relatively fast in an automated fashion using data science tools such this R package. Of course, we could have done this manually, but keep in mind as the number of probable words combination increases so the time doing the search in the conventional way. Using paper search automation is a nice feature in our toolbox so there is no loss of information during the paper research.

Those 25671 papers that have gas lift in their content may not represent how well or how intense the coverage of the subject is. Not until you provide some context to the search, for example, “gas lift optimization”, “dual gas lift”, “gas lift surveillance”, etc. You will see the number of results shrink rapidly when a context is supplied. But at least, now you know that you have two keywords to add to your search context.

In my view, the perfect search would be one where each of the papers return the number of times the keyword was mentioned, as well as the context or subject. You can do that but it requires that you have the physical paper to perform the search within the corpus of the paper. For us, engineers, is not practical, because we cannot store or purchase all those 24,000+ papers!

Maybe the day is coming when OnePetro includes in the search results: * the number of keyword occurrences within the corpus, * the quality of the paper, * number of tables, * number of figures, and * how strong are the keyword associations within the context of the petroleum engineering discipline.

Appendix A: brought up by Burney Waring

There are couple of keywords suggested by Burney Waring, one of the lead specialists of gas lift in the world (truly, we should write it gas-lift, from now on): the keywords gas-lifted to refer to the past participle, adjective form of something that is under a gas-lift method; example: a gas-lifted well. And, lift gas, to establish a difference with the gas injected to a well for reservoir pressure maintenance. These are the results.

This is the list of keywords. I have added: gas-lifted without the dash and lift gas with dash to see if we obtain papers with these word combinations.

library(petro.One)

major <- c("gas lift", 
           "gas-lift", 
           "GasLift", 
           "gas-lifted", 
           "gas lifted", 
           "lift gas", 
           "lift-gas", 
           "gas.lift", 
           "gas_lift")

gas_lift <- join_keywords(major, get_papers = TRUE, sleep = 3, verbose = TRUE)
#> 
#>   1  7273 'gas+lift'                                                   
#>   2  7273 'gas-lift'                                                   
#>   3   588 'GasLift'                                                    
#>   4   904 'gas-lifted'                                                 
#>   5   904 'gas+lifted'                                                 
#>   6  1547 'lift+gas'                                                   
#>   7  1547 'lift-gas'                                                   
#>   8     2 'gas.lift'                                                   
#>   9     4 'gas_lift'
# "gas lift" is the same as "gas-lift"
gas_lift$keywords
#> # A tibble: 9 x 4
#>   Var1     paper_count sf        url                                       
#>   <chr>          <dbl> <chr>     <chr>                                     
#> 1 gas lift        7273 'gas+lif~ "https://www.onepetro.org/search?q=\"gas+~
#> 2 gas-lift        7273 'gas-lif~ "https://www.onepetro.org/search?q=\"gas-~
#> 3 GasLift          588 'GasLift' "https://www.onepetro.org/search?q=\"GasL~
#> 4 gas-lif~         904 'gas-lif~ "https://www.onepetro.org/search?q=\"gas-~
#> 5 gas lif~         904 'gas+lif~ "https://www.onepetro.org/search?q=\"gas+~
#> 6 lift gas        1547 'lift+ga~ "https://www.onepetro.org/search?q=\"lift~
#> 7 lift-gas        1547 'lift-ga~ "https://www.onepetro.org/search?q=\"lift~
#> 8 gas.lift           2 'gas.lif~ "https://www.onepetro.org/search?q=\"gas.~
#> 9 gas_lift           4 'gas_lif~ "https://www.onepetro.org/search?q=\"gas_~

This is the dataframe with the results.

There is something interesting here: take a look at the paper count for gas-lift and lift-gas; they have the same number of papers written with those words. This could mean a coincidence, or, after the time that the word gas-lift started to be used, also a consensus for the use of the word lift-gas was adopted.

Although, we could see the words “gas lifted” and “lift gas”, did not find great adoption. Further investigation could be performed to connect a specific year, if the hypothesis is if there was a conference that established some standard in the gas-lift discipline. For that task, we would need a table of year vs. keyword vs. paper-count, and see where the breakthrough occurs.

Appendix B: brought up by Hector Partidas

Hector made a valid question on the correct use of the term “Sucker Rod Pumping” in SPE Connect. He says “I cannot visualize the rods handling the oil but if we apply the same criteria then this connotation is not right.”

True. But we have been educated to use that term to refer to a subsurface plunger-barrel system to pump oil out that I doubt -even not properly named- could be changed to something different. So, I was curious about what has been used in papers on “Sucker Rod Pumping”. Here is the list of keywords.

library(petro.One)

# provide the list of keywords
keywords <- "    
    sucker rod
    sucker-rod
    sucker rod pumping
    SRP
    beam pump
    beam pumping
    rod pump
    rod pumping
    pump jack
    surface pumping unit
    subsurface sucker rod pump
    subsurface pump
    sub-surface pump
    sub surface pump
    plunger pump
    tubing pump
    downhole pump
    downhole plunger
    stationary barrel
    travelling barrel
"
# convert the text to a dataframe
# read text table and split rows at carriage return
kw_tbl <- read.table(text = keywords, header = FALSE, sep = "\n", 
                     stringsAsFactors = FALSE, strip.white = TRUE, 
                     col.names = "keyword")
kw_tbl
#>                       keyword
#> 1                  sucker rod
#> 2                  sucker-rod
#> 3          sucker rod pumping
#> 4                         SRP
#> 5                   beam pump
#> 6                beam pumping
#> 7                    rod pump
#> 8                 rod pumping
#> 9                   pump jack
#> 10       surface pumping unit
#> 11 subsurface sucker rod pump
#> 12            subsurface pump
#> 13           sub-surface pump
#> 14           sub surface pump
#> 15               plunger pump
#> 16                tubing pump
#> 17              downhole pump
#> 18           downhole plunger
#> 19          stationary barrel
#> 20          travelling barrel
# iterate through the keywords dataframe
rec <- vector("list")
i <- 1
for (k in kw_tbl$keyword) {
    url_all  <- make_search_url(query = k, how = "all")    # create search query
    count    <- get_papers_count(url_all)                  # paper count
    rec[[i]] <- list(keyword = k, count = count)           # add observation
    cat(sprintf("%30s %5d \n", k, count))                  # print it
    i <-  i + 1                                            # increment counter
    Sys.sleep(5)                          # do not bug OnePetro website too much
}                                         # be a good internet citizen
#>                     sucker rod  1577 
#>                     sucker-rod  1665 
#>             sucker rod pumping   558 
#>                            SRP   477 
#>                      beam pump   518 
#>                   beam pumping   455 
#>                       rod pump  1275 
#>                    rod pumping   866 
#>                      pump jack    99 
#>           surface pumping unit    80 
#>     subsurface sucker rod pump    15 
#>                subsurface pump   154 
#>               sub-surface pump    20 
#>               sub surface pump    20 
#>                   plunger pump   236 
#>                    tubing pump   263 
#>                  downhole pump   974 
#>               downhole plunger    12 
#>              stationary barrel    12 
#>              travelling barrel     4

And here the results from OnePetro:

data.table::rbindlist(rec)                # final data table
#>                        keyword count
#>  1:                 sucker rod  1577
#>  2:                 sucker-rod  1665
#>  3:         sucker rod pumping   558
#>  4:                        SRP   477
#>  5:                  beam pump   518
#>  6:               beam pumping   455
#>  7:                   rod pump  1275
#>  8:                rod pumping   866
#>  9:                  pump jack    99
#> 10:       surface pumping unit    80
#> 11: subsurface sucker rod pump    15
#> 12:            subsurface pump   154
#> 13:           sub-surface pump    20
#> 14:           sub surface pump    20
#> 15:               plunger pump   236
#> 16:                tubing pump   263
#> 17:              downhole pump   974
#> 18:           downhole plunger    12
#> 19:          stationary barrel    12
#> 20:          travelling barrel     4

I had sweared that I would get a gazillion papers on SRP given the fact that is the artificial lift method with more deployed units in the world. Not so. And “sucker rod” is not widely accepted either; “rod pump” is also used. Not much “SRP” or “beam pumping”. Look at the bottom, the term “downhole pump”: 914 papers. Interesting! But we cannot say if it refers to sucker rod or another A/L system. We need context to respond to that.