The SymLink Tool is a way for researchers to manage multiple data pipeline output runs.
It’s designed with user flexibility and project officers in mind, and doesn’t require using anything like a database.
This tool assumes you have a large set of output folders for runs of your pipeline, and you store them on the file system.
The Symlink Tool will:
H: or
J: drive i.e. on the File System.OK, that’s nice, but symlinks are pretty easy to make. What else do I (the researcher) get for free if I use this tool?
The most important thing you get are some simple logs that automatically keep track of which folders have been ‘best.’
Hmm, is that all? I feel like I could keep an excel document or HUB page that does the same thing?
That’s very true, but this doesn’t require you to remember, or do any typing yourself.
Also, maybe there are pipeline runs you want to track for different reasons.
And your Project Officer gets things too!
That sounds pretty nice, but didn’t you also say something about deleting folders? Why do I need help doing that?
You get some additional benefits - The SymLink Tool will also:
When you’re ready to delete, you’ll get: 1. Safety - The Symlink Tool will only delete folders that marked to ‘remove’. 1. Provenance - You’ll get a record in the central log telling you which pipeline runs were deleted, when, why they were deleted (user gets to add a comment).
I’m still reading, and curious to see how this works.
What this demonstration is.
We’ll showcase the life-cycle of a typical pipeline output folder.
best_keep_remove_What this demonstration is not.
This won’t be an exhaustive demonstration of all the available options, this is a vignette of an average use-case.
symlink_tool_vignette_technical.Rmd file for
more detailed technical explanations.My team uses a output_root folder for all inputs we
submit to ST-GPR.
This way we can prepare the data, then submit various ST-GPR models with
different parameters without needing to re-prep the inputs. The results
of the ST-GPR models go into an output folder, which we’ll ignore for
simplicity.
library(vmTools)
library(data.table)
# Make the root folder
output_root <- file.path(tempdir(), "slt", "output_root")
dir.create(output_root, 
           recursive    = TRUE, 
           showWarnings = FALSE)Call on SLT bare to print class information and methods (functions linked with the tool).
# For this Intro Vignette, we're only showing public methods 
# - for all methods, see the Technical Vignette
SLT#> <Symlink_Tool> object generator
#>   Public:
#>     new: function (user_root_list = NULL, user_central_log_root = NULL, 
#>     return_dictionaries: function (item_names = NULL) 
#>     return_dynamic_fields: function (item_names = NULL) 
#>     mark_best: function (version_name, user_entry) 
#>     mark_keep: function (version_name, user_entry) 
#>     mark_remove: function (version_name, user_entry) 
#>     unmark: function (version_name, user_entry) 
#>     roundup_best: function () 
#>     roundup_keep: function () 
#>     roundup_remove: function () 
#>     roundup_unmarked: function () 
#>     roundup_by_date: function (user_date, date_selector) 
#>     get_common_new_version_name: function (date = "today", root_list = private$DICT$ROOTS) 
#>     make_new_version_folder: function (version_name = self$get_common_new_version_name()) 
#>     make_new_log: function (version_name) 
#>     delete_version_folders: function (version_name, user_entry, require_user_input = TRUE) 
#>     make_reports: function () 
#>     Call SLT$new() to make a Symlink Tool, with startup guidance messages!When you make a new tool, this tool is tied to a specific output folder. You can’t change the output folder once you’ve made the tool.
symlink_tool_vignette_technical.Rmd file.Note: You can define the root for results outputs and logs separately, but we’re using the same root for simplicity.
# Instantiate (create) a new Symlink Tool object
slt_prep <- SLT$new(
      user_root_list        = list("output_root" = output_root),
      user_central_log_root = output_root
   )Note: SLT is an R6 class included with
the vmTools package that manages the symlink tool.
Use the Symlink Tool to create a new folder in your output root.
YYYY_MM_DD.VV naming schemedate_vers1 <- get_output_dir(output_root, "2024_02_01")
slt_prep$make_new_version_folder(version_name = date_vers1)Capture some paths, using the Symlink Tool to help. We’ll use these in a minute.
path_log_central <- slt_prep$return_dictionaries()[["LOG_CENTRAL"]][["path"]]
fname_dv_log     <- slt_prep$return_dictionaries()[["log_path"]]
root_dv1         <- slt_prep$return_dynamic_fields()[["VERS_PATHS"]][["output_root"]]
path_log_dv1     <- file.path(root_dv1, fname_dv_log)Show the file tree.
#> |-- 2024_02_01.01
#> |  `-- logs
#> |     `-- log_version_history.csv
#> `-- log_symlinks_central.csvShow central log.
#>    log_id         timestamp    user version_name action     comment
#>     <int>            <char>  <char>       <char> <char>      <char>
#> 1:      0 2025_07_24_111336 ssbyrne  CENTRAL_LOG create log createdShow new run version folder log.
#>    log_id         timestamp    user  version_name action     comment
#>     <int>            <char>  <char>        <char> <char>      <char>
#> 1:      0 2025_07_24_111336 ssbyrne 2024_02_01.01 create log createdNow let’s make some files representing models in this folder.
# Make some dummy files
fnames_my_models <- paste0("my_model_", 1:5, ".csv")
invisible(file.create(file.path(root_dv1, fnames_my_models)))print_tree(output_root)
#> |-- 2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> `-- log_symlinks_central.csvWe like the models! We want to elevate this run version folder to
best_ status.
Note: All mark_xxxx operations require
a user entry as a named list.
comment field is currently supported (future
version will expand).# Mark best, and take note of messaging
slt_prep$mark_best(version_name = date_vers1,
                   user_entry   = list(comment = "Best model GBD2023"))
#> Marking best: 2024_02_01.01
#> No existing symlinks found - moving on
#> No 'best' symlink found - moving on: /tmp/Rtmp3tDGBK/slt/output_root/best
#> Promoting to 'best': /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01
#>   Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01/logs/log_version_history.csv
#>   Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csvInspect both the central log and …
#>    log_id         timestamp    user  version_name       action            comment
#>     <int>            <char>  <char>        <char>       <char>             <char>
#> 1:      0 2025_07_24_111336 ssbyrne   CENTRAL_LOG       create        log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023…the run version folder log.
#>    log_id         timestamp    user  version_name       action            comment
#>     <int>            <char>  <char>        <char>       <char>             <char>
#> 1:      0 2025_07_24_111336 ssbyrne 2024_02_01.01       create        log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023We now have a ‘best’ symlink that points to our ‘best’ run version,
2024_02_01.01
print_tree(output_root)
#> |-- 2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- best
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> `-- report_key_versions.csv
resolve_symlink(file.path(output_root, "best"))
#> [1] "/tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01"Since we run our pipelines many times, we want to track those runs.
Run the pipeline two more times on the same day, inspect the models, and make a human decision about the result quality.
# Second run
date_vers2 <- get_output_dir(output_root, "2024_02_01")
slt_prep$make_new_version_folder(version_name = date_vers2)
# note - the dynamic fields update when you make new folders, so we won't see the dv1 path anymore
root_dv2   <- slt_prep$return_dynamic_fields()$VERS_PATHS
invisible(file.create(file.path(root_dv2, fnames_my_models)))
# Third run
date_vers3 <- get_output_dir(output_root, "2024_02_01")
slt_prep$make_new_version_folder(version_name = date_vers3)
root_dv3   <- slt_prep$return_dynamic_fields()$VERS_PATHS
invisible(file.create(file.path(root_dv3, fnames_my_models)))Now let’s look at our file output structure, and central log.
print_tree(output_root)
#> |-- 2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.02
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.03
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- best
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> `-- report_key_versions.csv#>    log_id         timestamp    user  version_name       action            comment
#>     <int>            <char>  <char>        <char>       <char>             <char>
#> 1:      0 2025_07_24_111336 ssbyrne   CENTRAL_LOG       create        log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023After inspecting our results, we decide the third run is
actually the best_.
best_ status.# Mark best, and take note of messaging
slt_prep$mark_best(version_name = date_vers3,
                   user_entry   = list(comment = "New best model GBD2023"))
#> Marking best: 2024_02_01.03
#> No existing symlinks found - moving on
#> Demoting from 'best': /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01
#>   Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01/logs/log_version_history.csv
#>   Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csv
#> Promoting to 'best': /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03
#>   Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03/logs/log_version_history.csv
#>   Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csvInspect the central log - The third version is now bested.
#>    log_id         timestamp    user  version_name       action                comment
#>     <int>            <char>  <char>        <char>       <char>                 <char>
#> 1:      0 2025_07_24_111336 ssbyrne   CENTRAL_LOG       create            log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best     Best model GBD2023
#> 3:      2 2025_07_24_111337 ssbyrne 2024_02_01.01  demote_best New best model GBD2023
#> 4:      3 2025_07_24_111337 ssbyrne 2024_02_01.03 promote_best New best model GBD2023Note: Multiple “marks” on the same folder will produce no results (but reports will still run)
slt_prep$mark_best(version_name = date_vers3,
                   user_entry   = list(comment = "New best model GBD2023"))
#> Marking best: 2024_02_01.03
#> /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03 - already marked best - moving on.Let’s also take a look inside each of the run version folder logs.
best
automatically, and the third version was marked as best.
best
symlink points to the third pipeline run.Looking at all three run-version logs we see:
#>    log_id         timestamp    user  version_name       action                comment
#>     <int>            <char>  <char>        <char>       <char>                 <char>
#> 1:      0 2025_07_24_111336 ssbyrne 2024_02_01.01       create            log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best     Best model GBD2023
#> 3:      2 2025_07_24_111337 ssbyrne 2024_02_01.01  demote_best New best model GBD2023#>    log_id         timestamp    user  version_name action     comment
#>     <int>            <char>  <char>        <char> <char>      <char>
#> 1:      0 2025_07_24_111336 ssbyrne 2024_02_01.02 create log created#>    log_id         timestamp    user  version_name       action                comment
#>     <int>            <char>  <char>        <char>       <char>                 <char>
#> 1:      0 2025_07_24_111336 ssbyrne 2024_02_01.03       create            log created
#> 2:      1 2025_07_24_111337 ssbyrne 2024_02_01.03 promote_best New best model GBD2023print_tree(output_root)
#> |-- 2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.02
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.03
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- best
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> `-- report_key_versions.csv
resolve_symlink(file.path(output_root, "best"))
#> [1] "/tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03"We want to keep the first run, even though it’s not the best anymore.
best runs. We can mark
this version with a keep_ symlink.# Mark keep, and take note of messaging
slt_prep$mark_keep(
   version_name = date_vers1,
   user_entry   = list(comment = "Previous best")
)
#> No existing symlinks found - moving on
#> Promoting to 'keep': /tmp/Rtmp3tDGBK/slt/output_root/keep_2024_02_01.01
#>   Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01/logs/log_version_history.csv
#>   Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csvThe first version is now marked as keep.
#>    log_id         timestamp    user  version_name       action                comment
#>     <int>            <char>  <char>        <char>       <char>                 <char>
#> 1:      0 2025_07_24_111336 ssbyrne   CENTRAL_LOG       create            log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best     Best model GBD2023
#> 3:      2 2025_07_24_111337 ssbyrne 2024_02_01.01  demote_best New best model GBD2023
#> 4:      3 2025_07_24_111337 ssbyrne 2024_02_01.03 promote_best New best model GBD2023
#> 5:      4 2025_07_24_111337 ssbyrne 2024_02_01.01 promote_keep          Previous bestNote: Marking a folder keep_ does not
make it unique, like best_. Many folders can be marked
keep_.
#>    log_id         timestamp    user  version_name       action                comment
#>     <int>            <char>  <char>        <char>       <char>                 <char>
#> 1:      0 2025_07_24_111336 ssbyrne 2024_02_01.01       create            log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best     Best model GBD2023
#> 3:      2 2025_07_24_111337 ssbyrne 2024_02_01.01  demote_best New best model GBD2023
#> 4:      3 2025_07_24_111337 ssbyrne 2024_02_01.01 promote_keep          Previous bestprint_tree(output_root)
#> |-- 2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.02
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.03
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- best
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- keep_2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> `-- report_key_versions.csv
resolve_symlink(file.path(output_root, "keep_2024_02_01.01"))
#> [1] "/tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01"We want to remove the second run, because the model was experimental, or performed poorly.
remove_ symlink. From here we
could use the Symlink Tool to delete the folders, or round them up for
ST-GPR model deletion, etc. Either way, we now have a record of which
folders are no longer needed, and why.# Mark remove, and take note of messaging
slt_prep$mark_remove(
   version_name = date_vers2,
   user_entry   = list(comment = "Obsolete dev folder"))
#> No existing symlinks found - moving on
#> Promoting to 'remove': /tmp/Rtmp3tDGBK/slt/output_root/remove_2024_02_01.02
#>   Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.02/logs/log_version_history.csv
#>   Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csvInspect the central log - The second version is now marked as
remove_.
remove_ does
not make it unique, like best_. Many folders can be marked
remove_`.#>    log_id         timestamp    user  version_name         action                comment
#>     <int>            <char>  <char>        <char>         <char>                 <char>
#> 1:      0 2025_07_24_111336 ssbyrne   CENTRAL_LOG         create            log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01   promote_best     Best model GBD2023
#> 3:      2 2025_07_24_111337 ssbyrne 2024_02_01.01    demote_best New best model GBD2023
#> 4:      3 2025_07_24_111337 ssbyrne 2024_02_01.03   promote_best New best model GBD2023
#> 5:      4 2025_07_24_111337 ssbyrne 2024_02_01.01   promote_keep          Previous best
#> 6:      5 2025_07_24_111337 ssbyrne 2024_02_01.02 promote_remove    Obsolete dev folderprint_tree(output_root)
#> |-- 2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.02
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.03
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- best
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- keep_2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> |-- remove_2024_02_01.02
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> `-- report_key_versions.csv
resolve_symlink(file.path(output_root, "remove_2024_02_01.02"))
#> [1] "/tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.02"Now that we have marked the second run as remove_, we
can use the Symlink Tool to delete the folders.
First, we’ll find (roundup) all our remove_
folders.
(dt_to_remove <- slt_prep$roundup_remove())
#> $output_root
#>     version_name                                             dir_name                             dir_name_resolved
#>           <char>                                               <char>                                        <char>
#> 1: 2024_02_01.02 /tmp/Rtmp3tDGBK/slt/output_root/remove_2024_02_01.02 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.02Next, we can handle them any way we choose. For this demonstration, we’ll delete them.
_remove-marked runs to free quota space,
for example.for(dir_dv_remove in dt_to_remove$output_root$version_name){
   slt_prep$delete_version_folders(
      version_name       = dir_dv_remove,
      user_entry         = list(comment = "Deleting dev folder"),
      require_user_input = FALSE
   )
}
#> 
#>   Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csv
#> Deleting /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.02
#> Deleting /tmp/Rtmp3tDGBK/slt/output_root/remove_2024_02_01.02
# The default setting prompts user input, but the process can be automated, as for this vignette.
# 
# Do you want to delete the following folders?
#   /tmp/RtmpRmKCTu/slt/output_root/2024_02_01.02
#   /tmp/RtmpRmKCTu/slt/output_root/remove_2024_02_01.02 
# 
# 1: No
# 2: YesCheck the central log - since the folder is gone, this will maintain a record of when this folder was deleted.
#>    log_id         timestamp    user  version_name               action                comment
#>     <int>            <char>  <char>        <char>               <char>                 <char>
#> 1:      0 2025_07_24_111336 ssbyrne   CENTRAL_LOG               create            log created
#> 2:      1 2025_07_24_111336 ssbyrne 2024_02_01.01         promote_best     Best model GBD2023
#> 3:      2 2025_07_24_111337 ssbyrne 2024_02_01.01          demote_best New best model GBD2023
#> 4:      3 2025_07_24_111337 ssbyrne 2024_02_01.03         promote_best New best model GBD2023
#> 5:      4 2025_07_24_111337 ssbyrne 2024_02_01.01         promote_keep          Previous best
#> 6:      5 2025_07_24_111337 ssbyrne 2024_02_01.02       promote_remove    Obsolete dev folder
#> 7:      6 2025_07_24_111337 ssbyrne 2024_02_01.02 delete_remove_folder    Deleting dev folderNote: As soon as we marked a folder, there was a
report ready in our folder. The report_key_versions.csv
file will scan every run-version with a Tool-created Symlink for a log,
and show its last row (current status).
(data.table::fread(file.path(output_root, "report_key_versions.csv")))
#>    log_id         timestamp    user  version_name                                  version_path       action                comment
#>     <int>            <char>  <char>        <char>                                        <char>       <char>                 <char>
#> 1:      1 2025_07_24_111337 ssbyrne 2024_02_01.03 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03 promote_best New best model GBD2023
#> 2:      3 2025_07_24_111337 ssbyrne 2024_02_01.01 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01 promote_keep          Previous bestWe can generate more reports of the pipeline runs, and the status of the folders based on different needs. These reports are useful for tracking the status of the pipeline runs, and for making decisions about which folders to keep, delete, or promote.
REPORT_DISCREPANCIES.csv that will show
issues with the run-version logs, in case some were edited by hand in
ways that could cause problems.# Generate reports
slt_prep$make_reports()
#> Writing last-row log reports for:
#>   /tmp/Rtmp3tDGBK/slt/output_root
#>   /tmp/Rtmp3tDGBK/slt/output_rootprint_tree(output_root)
#> |-- 2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- 2024_02_01.03
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- best
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- keep_2024_02_01.01
#> |  |-- logs
#> |  |  `-- log_version_history.csv
#> |  |-- my_model_1.csv
#> |  |-- my_model_2.csv
#> |  |-- my_model_3.csv
#> |  |-- my_model_4.csv
#> |  `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> |-- report_all_logs.csv
#> |-- report_all_logs_non_symlink.csv
#> |-- report_all_logs_symlink.csv
#> `-- report_key_versions.csvThe report_all_logs.csv file will scan every
run-version for a log, and show its last row (current status).
(data.table::fread(file.path(output_root, "report_all_logs.csv")))
#>    log_id         timestamp    user  version_name                                  version_path       action                comment
#>     <int>            <char>  <char>        <char>                                        <char>       <char>                 <char>
#> 1:      3 2025_07_24_111337 ssbyrne 2024_02_01.01 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01 promote_keep          Previous best
#> 2:      1 2025_07_24_111337 ssbyrne 2024_02_01.03 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03 promote_best New best model GBD2023Two other reports sometimes diagnostically helpful are:
report_all_logs_symlink.csv file will scan
run-version folders for log of any other symlink type (in case the user
hand-creates symlinks).report_all_logs_non_symlink.csv file will scan
run-version folders that are not currently marked, and show their
current status.