One of the little annoyances while doing paper research in OnePetro is knowing the correct spelling of the keywords under search. It would seem insignificant but we will see in this article choosing the right keyword could have effects on the results. Let’s see a practical example.
For this demonstration I will use the R package petro.One. It is available from CRAN, as free and open source project. The advantage of using petro.One is that you can do many operations in batch, perform search automation, and receive a table (dataframe) as a result.
We start by creating a list of all the possible combinations of the words “gas” and “lift”. These are the some that I have seen written in papers, presentations and articles. There may be more, of course. You can try yourself if you install and run petro.One in your computer.
In the code chunk below each of the keywords has been entered as one line of text. We enclosed that text with quotes and assign it to the object keywords. Then, we convert that text into a dataframe using the R function read.table, providing some arguments. For instance, we are not providing a header (header = FALSE), the separation between keywords is a new line (sep = “”), we don’t want factors but text (stringsAsFactors = FALSE), remove blanks between words (strip.white = TRUE), and the name of the column is keyword (col.names = “keyword”).
library(petro.One)
# provide the list of keywords
keywords <- "
gas lift
gas-lift
GasLift
gas.lift
gas_lift"
# convert the text to a dataframe
# read text table and split rows at carriage return
kw_tbl <- read.table(text = keywords, header = FALSE, sep = "\n",
stringsAsFactors = FALSE, strip.white = TRUE,
col.names = "keyword")
The result is this little table or dataframe below. I am using a dataframe and not a vector because I am expecting that in other cases we could have many more word combinations; let’s say, more than 20, or maybe above 100. And that’s better managed with a dataframe.
# show the dataframe
kw_tbl
#> keyword
#> 1 gas lift
#> 2 gas-lift
#> 3 GasLift
#> 4 gas.lift
#> 5 gas_lift
Now that we have all the word combinations stored in a table, we will iterate through all these keywords and send a query search to OnePetro for each of them. This means that that we are sending an automated search to the OnePetro website. Because we are very good internet citizens, we are also taking care of not sending too much traffic to the website by adding a pause of five seconds between searches.
Build iteration loop
# iterate through the keywords dataframe
rec <- vector("list")
i <- 1
for (k in kw_tbl$keyword) {
url_all <- make_search_url(query = k, how = "all") # create search query
count <- get_papers_count(url_all) # paper count
rec[[i]] <- list(keyword = k, count = count) # add observation
cat(sprintf("%30s %5d \n", k, count)) # print it
i <- i + 1 # increment counter
Sys.sleep(5) # do not bug OnePetro website too much
} # be a good internet citizen
#> gas lift 7273
#> gas-lift 17804
#> GasLift 588
#> gas.lift 2
#> gas_lift 4
These are the results.
dt <- data.table::rbindlist(rec) # final data table
dt
#> keyword count
#> 1: gas lift 7273
#> 2: gas-lift 17804
#> 3: GasLift 588
#> 4: gas.lift 2
#> 5: gas_lift 4
Observations
The keyword with most papers written is the word “gas”, followed by dash, followed by “lift”. There are 17804 papers which contain the word “gas-lift”. The second keyword is “gas lift” with 7273. That is the word “gas”, followed by a space, followed by “lift”. Both keywords account for little more than 97.69 % of the papers. There is a third with 588 papers: the keyword “GasLift”. It is not pretty common but I have seen it in some literature (Shell?, maybe). There are two more keywords with 6 paper results. Very marginal within the context of 25671 papers. The words “gas.lift” and “gas_lift”, probably typos.
Conclusions
With this short example we could appreciate the importance of considering the right keywords when performing paper research. We could do this relatively fast in an automated fashion using data science tools such this R package. Of course, we could have done this manually, but keep in mind as the number of probable words combination increases so the time doing the search in the conventional way. Using paper search automation is a nice feature in our toolbox so there is no loss of information during the paper research.
Those 25671 papers that have gas lift in their content may not represent how well or how intense the coverage of the subject is. Not until you provide some context to the search, for example, “gas lift optimization”, “dual gas lift”, “gas lift surveillance”, etc. You will see the number of results shrink rapidly when a context is supplied. But at least, now you know that you have two keywords to add to your search context.
In my view, the perfect search would be one where each of the papers return the number of times the keyword was mentioned, as well as the context or subject. You can do that but it requires that you have the physical paper to perform the search within the corpus of the paper. For us, engineers, is not practical, because we cannot store or purchase all those 24,000+ papers!
Maybe the day is coming when OnePetro includes in the search results: * the number of keyword occurrences within the corpus, * the quality of the paper, * number of tables, * number of figures, and * how strong are the keyword associations within the context of the petroleum engineering discipline.
Appendix A: brought up by Burney Waring
There are couple of keywords suggested by Burney Waring, one of the lead specialists of gas lift in the world (truly, we should write it gas-lift, from now on): the keywords gas-lifted to refer to the past participle, adjective form of something that is under a gas-lift method; example: a gas-lifted well. And, lift gas, to establish a difference with the gas injected to a well for reservoir pressure maintenance. These are the results.
This is the list of keywords. I have added: gas-lifted without the dash and lift gas with dash to see if we obtain papers with these word combinations.
library(petro.One)
major <- c("gas lift",
"gas-lift",
"GasLift",
"gas-lifted",
"gas lifted",
"lift gas",
"lift-gas",
"gas.lift",
"gas_lift")
gas_lift <- join_keywords(major, get_papers = TRUE, sleep = 3, verbose = TRUE)
#>
#> 1 7273 'gas+lift'
#> 2 7273 'gas-lift'
#> 3 588 'GasLift'
#> 4 904 'gas-lifted'
#> 5 904 'gas+lifted'
#> 6 1547 'lift+gas'
#> 7 1547 'lift-gas'
#> 8 2 'gas.lift'
#> 9 4 'gas_lift'
# "gas lift" is the same as "gas-lift"
gas_lift$keywords
#> # A tibble: 9 x 4
#> Var1 paper_count sf url
#> <chr> <dbl> <chr> <chr>
#> 1 gas lift 7273 'gas+lif~ "https://www.onepetro.org/search?q=\"gas+~
#> 2 gas-lift 7273 'gas-lif~ "https://www.onepetro.org/search?q=\"gas-~
#> 3 GasLift 588 'GasLift' "https://www.onepetro.org/search?q=\"GasL~
#> 4 gas-lif~ 904 'gas-lif~ "https://www.onepetro.org/search?q=\"gas-~
#> 5 gas lif~ 904 'gas+lif~ "https://www.onepetro.org/search?q=\"gas+~
#> 6 lift gas 1547 'lift+ga~ "https://www.onepetro.org/search?q=\"lift~
#> 7 lift-gas 1547 'lift-ga~ "https://www.onepetro.org/search?q=\"lift~
#> 8 gas.lift 2 'gas.lif~ "https://www.onepetro.org/search?q=\"gas.~
#> 9 gas_lift 4 'gas_lif~ "https://www.onepetro.org/search?q=\"gas_~
This is the dataframe with the results.
There is something interesting here: take a look at the paper count for gas-lift and lift-gas; they have the same number of papers written with those words. This could mean a coincidence, or, after the time that the word gas-lift started to be used, also a consensus for the use of the word lift-gas was adopted.
Although, we could see the words “gas lifted” and “lift gas”, did not find great adoption. Further investigation could be performed to connect a specific year, if the hypothesis is if there was a conference that established some standard in the gas-lift discipline. For that task, we would need a table of year vs. keyword vs. paper-count, and see where the breakthrough occurs.
Appendix B: brought up by Hector Partidas
Hector made a valid question on the correct use of the term “Sucker Rod Pumping” in SPE Connect. He says “I cannot visualize the rods handling the oil but if we apply the same criteria then this connotation is not right.”
True. But we have been educated to use that term to refer to a subsurface plunger-barrel system to pump oil out that I doubt -even not properly named- could be changed to something different. So, I was curious about what has been used in papers on “Sucker Rod Pumping”. Here is the list of keywords.
library(petro.One)
# provide the list of keywords
keywords <- "
sucker rod
sucker-rod
sucker rod pumping
SRP
beam pump
beam pumping
rod pump
rod pumping
pump jack
surface pumping unit
subsurface sucker rod pump
subsurface pump
sub-surface pump
sub surface pump
plunger pump
tubing pump
downhole pump
downhole plunger
stationary barrel
travelling barrel
"
# convert the text to a dataframe
# read text table and split rows at carriage return
kw_tbl <- read.table(text = keywords, header = FALSE, sep = "\n",
stringsAsFactors = FALSE, strip.white = TRUE,
col.names = "keyword")
kw_tbl
#> keyword
#> 1 sucker rod
#> 2 sucker-rod
#> 3 sucker rod pumping
#> 4 SRP
#> 5 beam pump
#> 6 beam pumping
#> 7 rod pump
#> 8 rod pumping
#> 9 pump jack
#> 10 surface pumping unit
#> 11 subsurface sucker rod pump
#> 12 subsurface pump
#> 13 sub-surface pump
#> 14 sub surface pump
#> 15 plunger pump
#> 16 tubing pump
#> 17 downhole pump
#> 18 downhole plunger
#> 19 stationary barrel
#> 20 travelling barrel
# iterate through the keywords dataframe
rec <- vector("list")
i <- 1
for (k in kw_tbl$keyword) {
url_all <- make_search_url(query = k, how = "all") # create search query
count <- get_papers_count(url_all) # paper count
rec[[i]] <- list(keyword = k, count = count) # add observation
cat(sprintf("%30s %5d \n", k, count)) # print it
i <- i + 1 # increment counter
Sys.sleep(5) # do not bug OnePetro website too much
} # be a good internet citizen
#> sucker rod 1577
#> sucker-rod 1665
#> sucker rod pumping 558
#> SRP 477
#> beam pump 518
#> beam pumping 455
#> rod pump 1275
#> rod pumping 866
#> pump jack 99
#> surface pumping unit 80
#> subsurface sucker rod pump 15
#> subsurface pump 154
#> sub-surface pump 20
#> sub surface pump 20
#> plunger pump 236
#> tubing pump 263
#> downhole pump 974
#> downhole plunger 12
#> stationary barrel 12
#> travelling barrel 4
And here the results from OnePetro:
data.table::rbindlist(rec) # final data table
#> keyword count
#> 1: sucker rod 1577
#> 2: sucker-rod 1665
#> 3: sucker rod pumping 558
#> 4: SRP 477
#> 5: beam pump 518
#> 6: beam pumping 455
#> 7: rod pump 1275
#> 8: rod pumping 866
#> 9: pump jack 99
#> 10: surface pumping unit 80
#> 11: subsurface sucker rod pump 15
#> 12: subsurface pump 154
#> 13: sub-surface pump 20
#> 14: sub surface pump 20
#> 15: plunger pump 236
#> 16: tubing pump 263
#> 17: downhole pump 974
#> 18: downhole plunger 12
#> 19: stationary barrel 12
#> 20: travelling barrel 4
I had sweared that I would get a gazillion papers on SRP given the fact that is the artificial lift method with more deployed units in the world. Not so. And “sucker rod” is not widely accepted either; “rod pump” is also used. Not much “SRP” or “beam pumping”. Look at the bottom, the term “downhole pump”: 914 papers. Interesting! But we cannot say if it refers to sucker rod or another A/L system. We need context to respond to that.
References
petro.One repository: https://github.com/f0nzie/petro.One
petro.One website: https://f0nzie.github.io/petro.One/
OnePetro website: https://www.onepetro.org/