| Type: | Package |
| Title: | Automatically Fetching References Metadata from Literature Databases |
| Version: | 0.2.1 |
| Maintainer: | Thomas Dumond <thomas.dumond@adelaide.edu.au> |
| Description: | Provides functions to automatically retrieve and deduplicate reference metadata based on saved search strings. Access to Web of Science and Scopus requires personal API keys, while PubMed can be queried without one. The optional deduplication functionality requires the package 'ASySD' available from https://github.com/camaradesuk/ASySD. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Imports: | dplyr, httr, jsonlite, openxlsx, purrr, readxl, xml2 |
| Suggests: | ASySD, cronR, knitr, rmarkdown, taskscheduleR, testthat (≥ 3.0.0) |
| RoxygenNote: | 7.3.3 |
| Config/testthat/edition: | 3 |
| URL: | https://github.com/thomasdumond/LitFetchR, https://thomasdumond.github.io/LitFetchR/ |
| BugReports: | https://github.com/thomasdumond/LitFetchR/issues |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-02-07 11:09:31 UTC; a1224158 |
| Author: | Thomas Dumond |
| Repository: | CRAN |
| Date/Publication: | 2026-02-10 20:40:02 UTC |
Automating the retrieval of references based on a saved search string(s).
Description
Creates a read-only Rscript and a task to run the code automatically at a specified frequency and time, to retrieve references corresponding to the saved search string(s) on up to three platforms (e.g. Web of Science, Scopus and PubMed).
Usage
auto_LitFetchR_setup(
task_id = "task_id",
when = "DAILY",
time = "08:00",
wos = FALSE,
scp = FALSE,
pmd = FALSE,
directory,
dedup = FALSE,
open_file = FALSE,
dry_run = FALSE
)
Arguments
task_id |
Name of the automated reference retrieval task (e.g. one keyword describing your review). |
when |
Frequency of the automated reference retrieval task (DAILY, WEEKLY or MONTHLY). |
time |
Time of the automated reference retrieval task (must be HH:MM 24-hour clock format). |
wos |
Runs the search on Web of Science (TRUE or FALSE). |
scp |
Runs the search on Scopus (TRUE or FALSE). |
pmd |
Runs the search on PubMed (TRUE or FALSE). |
directory |
Choose the directory in which the search string is saved (Project's directory). That is also where the references metadata will be saved. |
dedup |
Deduplicates the retrieved references (TRUE or FALSE). |
open_file |
Automatically opens the CSV file after reference retrieval. |
dry_run |
Simulation run option. |
Value
NULL (invisibly). Called for its side effects:
writes an R script and schedules a task (Windows Task Scheduler or cron)
to run the script automatically.
Examples
# This is a "dry run" example.
# No task will actually be scheduled,
# it only shows how the function should react.
auto_LitFetchR_setup(task_id = "fish_vibrio",
when = "WEEKLY",
time = "14:00",
wos = TRUE,
scp = TRUE,
pmd = TRUE,
directory,
dedup = FALSE,
open_file = FALSE,
dry_run = TRUE
)
Creates a unique name for any document.
Description
Creates a unique name for any document.
Usage
build_sheet_name(time = Sys.time())
Arguments
time |
System time at the time the function is run. |
Value
Character scalar. A unique name based on the system time with the format YYYY-MM-DD-HHMMSS.
Creates an excel file to store the deduplication history.
Description
Creates an excel file to store the deduplication history.
Usage
create_dedup_history(directory)
Arguments
directory |
Choose the directory in which the references deduplication history will be saved. |
Value
A list with elements:
- history_dedup
A Workbook object (from openxlsx).
- hist_dedup_path
Character. Path to the created .xlsx file.
Creates an excel file to store the references identification retrieved at each search.
Description
Creates an excel file to store the references identification retrieved at each search.
Usage
create_id_history(directory)
Arguments
directory |
Choose the directory in which the references identification history will be saved. |
Value
A list with element:
- history_id
A Workbook object (from openxlsx).
Creates and saves search string(s).
Description
An interactive function that ask the user to enter a search string and provide the number of results from 3 platforms: Web of Science, Scopus and PubMed. You can then save one or more search strings to retrieve the references later.
Usage
create_save_search(
wos = FALSE,
scp = FALSE,
pmd = FALSE,
directory,
dry_run = FALSE
)
Arguments
wos |
Runs the search on Web of Science (TRUE or FALSE). |
scp |
Runs the search on Scopus (TRUE or FALSE). |
pmd |
Runs the search on PubMed (TRUE or FALSE). |
directory |
Choose the directory in which the search string and the search history will be saved. |
dry_run |
Simulation run option. |
Value
NULL (invisibly). Called for its side effects:
interactive querying and writing search history files.
Examples
# This is a "dry run" example.
# No search will be created and no database will be accessed.
# It only shows how the function should react.
create_save_search(wos = TRUE,
scp = TRUE,
pmd = TRUE,
directory,
dry_run = TRUE)
Creates an excel file to store the history of searches
made using create_save_search().
Description
Creates an excel file to store the history of searches
made using create_save_search().
Usage
create_search_history(directory)
Arguments
directory |
Choose the directory in which the search history will be saved. |
Value
A list with elements:
- history_search
Workbook object.
- sheet_name
Character scalar.
Deduplicates the references from up to three dataframes.
Description
Deduplicates the references from up to three dataframes.
Usage
dedup_refs(
df1 = NULL,
df2 = NULL,
df3 = NULL,
directory,
open_file = FALSE,
dry_run = FALSE
)
Arguments
df1 |
Dataframe 1 (can be NULL) |
df2 |
Dataframe 2 (can be NULL) |
df3 |
Dataframe 3 (can be NULL) |
directory |
Choose the directory in which the references deduplication history will be saved. |
open_file |
Automatically opens the CSV file after reference retrieval. |
dry_run |
Simulation run option. |
Value
NULL (invisibly). Called for its side effects:
writes a CSV of deduplicated citations and
an Excel workbook recording the deduplication history.
Examples
# This is a "dry run" example.
# No deduplication will happen.
# It only shows how the function should react.
dedup_refs(df1 = df_vibrio_wos,
df2 = df_vibrio_scp,
df3 = df_vibrio_pmd,
directory = tempdir(),
open_file = FALSE,
dry_run = TRUE
)
Extracts the metadata from the new references found on PubMed based on the search string(s) saved in "search_list.txt".
Description
Extracts the metadata from the new references found on PubMed based on the search string(s) saved in "search_list.txt".
Usage
extract_pmd_list(search_list_path, directory)
Arguments
search_list_path |
Path to "search_list.txt". |
directory |
Choose the directory in which the references identification history will be saved. |
Value
A data.frame with one row per retrieved PubMed record and columns:
- author
Character. Publication authors.
- year
Character. Publication year.
- title
Character. Publication title.
- journal
Character. Publication journal name.
- volume
Character. Publication journal volume.
- issue
Character. Publication journal issue.
- abstract
Character. Publication abstract.
- doi
Character. Publication Digital Object Identifier (DOI).
- source
Character. Data source.
- platform_id
Character. Publication unique identifier in data source.
If search_list_path does not exist, returns NULL.
Extracts the metadata from the new references found on Scopus based on the search string(s) saved in "search_list.txt".
Description
Extracts the metadata from the new references found on Scopus based on the search string(s) saved in "search_list.txt".
Usage
extract_scp_list(search_list_path, directory)
Arguments
search_list_path |
Path to "search_list.txt". |
directory |
Choose the directory in which the references identification history will be saved. |
Value
A data.frame with one row per retrieved Scopus record and columns:
- author
Character. Publication authors.
- year
Character. Publication year.
- title
Character. Publication title.
- journal
Character. Publication journal name.
- volume
Character. Publication journal volume.
- issue
Character. Publication journal issue.
- abstract
Character. Publication abstract.
- doi
Character. Publication Digital Object Identifier (DOI).
- source
Character. Data source.
- platform_id
Character. Publication unique identifier in data source.
If search_list_path does not exist, returns NULL.
extract the metadata from the new references from Web of Science based on the search strings found in search_list.txt
Description
extract the metadata from the new references from Web of Science based on the search strings found in search_list.txt
Usage
extract_wos_list(search_list_path, directory)
Arguments
search_list_path |
path to search_list |
directory |
Choose the directory in which the references identification history will be saved. |
Value
A data.frame with one row per retrieved Web of Science record and columns:
- author
Character. Publication authors.
- year
Character. Publication year.
- title
Character. Publication title.
- journal
Character. Publication journal name.
- volume
Character. Publication journal volume.
- issue
Character. Publication journal issue.
- abstract
Character. Publication abstract.
- doi
Character. Publication Digital Object Identifier (DOI).
- source
Character. Data source.
- platform_id
Character. Publication unique identifier in data source.
If search_list_path does not exist, returns NULL.
Transforms a long computer path into a shorter.
Description
Transforms a long computer path into a shorter.
Usage
get_short_path(path)
Arguments
path |
Path to the document. |
Value
Character scalar, a shorter path to use in Windows OS.
Manual literature retrieval.
Description
Retrieves references corresponding to the saved search string(s) on up to three platforms (e.g. Web of Science, Scopus and PubMed).
Usage
manual_fetch(
wos = FALSE,
scp = FALSE,
pmd = FALSE,
directory,
dedup = FALSE,
open_file = FALSE,
dry_run = FALSE
)
Arguments
wos |
Runs the search on Web of Science (TRUE or FALSE). |
scp |
Runs the search on Scopus (TRUE or FALSE). |
pmd |
Runs the search on PubMed (TRUE or FALSE). |
directory |
Choose the directory in which the search string is saved (Project's directory). That is also where the references metadata will be saved. |
dedup |
Deduplicates the retrieved references (TRUE or FALSE). |
open_file |
Automatically opens the CSV file after reference retrieval. |
dry_run |
Simulation run option. |
Value
NULL (invisibly). Called for its side effects: Create a CSV file with the references metadata, a history file of the references retrieved and a history file of the deduplication (if the option is selected).
Examples
# This is a "dry run" example.
# No references will actually be scheduled, it only shows how the function should react.
manual_fetch(wos = TRUE,
scp = TRUE,
pmd = TRUE,
directory,
dedup = TRUE,
open_file = FALSE,
dry_run = TRUE
)
Removes a scheduled task.
Description
Removes a scheduled task using the "task_id" from Task Scheduler (Windows) or Cron (Mac/Linux).
Usage
remove_scheduled_task(task_id, dry_run = FALSE)
Arguments
task_id |
Name/ID of the scheduled task (Windows Task Scheduler or Cron). |
dry_run |
Simulation run option. |
Value
NULL (invisibly). Called for its side effects: removes a scheduled task saved using the function 'auto_LitFetchR_setup'.
Examples
# This is a "dry run" example.
# No task will actually be removed, it only shows how the function should react.
remove_scheduled_task("fish_vibrio",
dry_run = TRUE
)
Saves Web of Science and/or Scopus API keys in .Renviron.
Description
You can set wos_api_key, scp_api_key, or both at the same time. Remember to restart the R session after saving your API keys.
Usage
save_api_keys(wos_api_key = NULL, scp_api_key = NULL, dry_run = FALSE)
Arguments
wos_api_key |
The API key value for Web of Science (use quotation marks). |
scp_api_key |
The API key value for Scopus (use quotation marks). |
dry_run |
Simulation run option. |
Value
Logical. TRUE if at least one value was written, FALSE if left unchanged.
Examples
save_api_keys(wos_api_key = "abcd01234",
scp_api_key = "efgh5678",
dry_run = TRUE
)