In trial simulation mode, we simulate patient data and then analyze it with a (usually Bayesian) model. In trial execution mode, we begin with patient data and just do the downstream analysis. This is useful for simulating entire clinical programs where the same patients move from trial to trial.

How it works

The workflow is very similar to the parallel computing section of the vignette. First, we use our FACTS file and run_flfll() to generate a directory of param files. The example below uses a tempfile() to store the param files (i.e. output_path). However, for distributed computing on traditional HPC clusters, output_path should be a directory path that all nodes can access.

library(rfacts)
facts_file <- get_facts_file_example("dichot.facts") # could be any FACTS file
# On traditional HPC clusters, this should be a shared directory
# instead of a temp directory:
tmp <- fs::dir_create(tempfile())
all_param_files <- file.path(tmp, "param_files")

# Set n_weeks_files to 0 so we only read the weeks files generated by
# trial execution mode.
run_flfll(facts_file, all_param_files, n_weeks_files = 0L)

Since we are supplying our own data, VSR scenarios lose their meaning, and only need a single VSR scenario. So we pick one. Any will do.

param_files <- get_param_dirs(all_param_files)[1]
basename(param_files)

Next, we write a function to perform a single simulation. It simulates a single set of patients, does some custom data processing on the patients files, runs trial execution mode on those patients files, and returns the aggregated weeks files as an in-memory data frame. In functions like this, be sure to set a unique seed for each simulation iteration.

run_once <- function(index, param_files) {
  out <- tempfile()
  dir_copy(param_files, out) # Requires the fs package.
  run_engine_dichot(out, n_sims = 1L)
  pats <- read_patients(out) # Read and aggregate all the patients files.
  # Here, do some custom data processing on the whole pats data frame...
  # Write the processed patient data to the original patients files.
  overwrite_csv_files(pats)
  run_engine_dichot(
    out,
    n_sims = 1L,
    seed = index,
    mode = "r",
    execdata = "patients00001.csv", # Custom / modified patients files.
    final = TRUE
  )
  read_weeks(out)
}

Test locally first

The data frame below is an aggregate of all the weeks00000.csv files from trial execution mode.

library(dplyr)
library(fs)

# Ignore the facts_sim column since all weeks files were indexed 00000.
# For data post-processing, use the facts_id column instead.
lapply(seq_len(2), run_once, param_files = param_files) %>%
  bind_rows()

On a cluster

Thanks to clustermq, it is straightforward to run simulations in parallel on a cluster. First, configure clustermq with a template file and global options. Here, we demonstrate using an SGE cluster.

# Configure clustermq to use our grid and your template file.
# If you are using a scheduler like SGE, you need to write a template file
# like clustermq.tmpl. To learn how, visit
# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1
options(clustermq.scheduler = "sge", clustermq.template = "clustermq.tmpl")

Then, run the simulations.

library(clustermq)
weeks <- Q(
  fun = run_once,
  iter = seq_len(1e3), # Run 1000 simulations.
  const = list(param_files = param_files),
  pkgs = c("fs", "rfacts"),
  n_jobs = 1e2 # Use 100 clustermq workers.
) %>%
  bind_rows()