In this section we will go through a typical analysis workflow using the castle
package:
.csv
-format.The first thing we need to do is load the packages we need for the analysis:
library(castle)
#> Loading required package: readr
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
We start by loading a QuantaSoft dataset in .csv
-format using the included import_QS_files
-function. Here we will use the example data included in the package, where we have both negative samples of different concentrations (training samples) and some test samples available for a KRAS G12D assay. For this assay Ch1 was used for measuring the mutant signal, which is set using the flag Ch1_is_mutation = TRUE
. If Ch2 was used this is simply set to FALSE
. Data is loaded like this:
path_to_my_data <- c("path/to/a/csv_file", "path/to/a/folder/with/csv_files/")
my_samples <- import_QS_files(path_to_my_data)
For this example we will load the training samples that are bundled in the castle
-package:
path_to_my_data <- system.file("extdata/example_data/", "background_data_KRAS_G12D.csv", package = "castle")
background_samples <- import_QS_files(path_to_my_data, Ch1_is_mutation = TRUE)
The first 5 samples/wells in background_samples
looks like this:
head(background_samples, 5)
#> # A tibble: 5 × 13
#> FileName Sample Well Ch1TargetType Ch2TargetType Target DoubleNegativeD…
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 background… NC20_1… A01 Unknown Unknown KRAS_G… 2415
#> 2 background… NC21_1… B01 Unknown Unknown KRAS_G… 1534
#> 3 background… NC22_1… C01 Unknown Unknown KRAS_G… 2041
#> 4 background… NC23_1… D01 Unknown Unknown KRAS_G… 183
#> 5 background… NC24_1… E01 Unknown Unknown KRAS_G… 1487
#> # … with 6 more variables: WildtypeOnlyDroplets <dbl>,
#> # MutantOnlyDroplets <dbl>, DoublePositiveDroplets <dbl>,
#> # TotalDroplets <dbl>, NumberOfMergedWells <dbl>, MergedWells <lgl>
Note that besides some metadata (file name, target etc.) for each sample, the dataset contains the columns “WildtypeOnlyDroplets”, “MutantOnlyDroplets”, “DoubleNegativeDroplets” and “DoublePositiveDroplets”. These are the counts of the four different kinds of droplets for each sample based on the two signals in Ch1 and Ch2. The following model training and statistical tests are based solely on these counts.
To train the model we are only interested in the negative samples. Ideally these should cover the range of different concentrations of DNA that the trained model is expected to encounter in future test samples. Before training we start by removing the empty samples (“NTC”) and positive controls from the training dataset we loaded above:
# Remove control samples ("NTC" and "Positive Control")
clean_background_samples <-
background_samples %>%
filter(
!(Ch1TargetType %in% c("NTC", "Positive Control")),
!(Ch2TargetType %in% c("NTC", "Positive Control"))
)
The cleaned data is then used for model training using the function train_simple_ddpcr_model
:
trained_model <- train_integrated_ddpcr_model(
background_samples = clean_background_samples
)
The object trained_model
is simply a list of parameters used in the statistical model in CASTLE.
For testing we will use a variety of samples from two different patients, who have undergone curative surgery for colorectal cancer. We start by loading the data from these patients. To load the data we will again use the function import_QS_files
. But this time, data from some of the samples is distributed across multiple wells. These can be merged together using the flag merge_wells = 'yes'
(see ?import_QS_files
for other methods for merging). Furthermore samples across multiple files can also be merged using the flag merge_files = TRUE
if necessary.
# Path to example data in the package
path_to_my_test_samples <- system.file(
"extdata/example_data/test_samples",
c(
"patient_1_plasma_KRAS_G12D.csv",
"patient_2_plasma_KRAS_G12D.csv"
),
package = "castle"
)
# Import test examples
test_samples <-
import_QS_files(
path_to_my_test_samples,
merge_wells = "yes"
)
To test the samples we use the function test_tumor_sample_integrated
and use the model we trained above:
# Make calls using CASTLE
test_res <- test_tumor_sample_integrated(
test_samples = test_samples,
integrated_model = trained_model
)
We will start by looking at some of the results for the tumor and PBL (white blood cells) sample from each patient:
test_res %>%
dplyr::filter(Sample %in% c("Tumor", "PBL")) %>%
dplyr::select(FileName, Sample, p_val, mutation_detected)
#> # A tibble: 4 × 4
#> FileName Sample p_val mutation_detected
#> <chr> <chr> <dbl> <lgl>
#> 1 patient_1_plasma_KRAS_G12D.csv Tumor 0 TRUE
#> 2 patient_1_plasma_KRAS_G12D.csv PBL 1 FALSE
#> 3 patient_2_plasma_KRAS_G12D.csv Tumor 0 TRUE
#> 4 patient_2_plasma_KRAS_G12D.csv PBL 0.455 FALSE
From this, we clearly see that the mutation is present in the tumor from both of the patients and not in PBL. We can therefore use the mutation as a cancer-specific marker, and thus to check if the patient has residual disease after curative intent surgery. As a positive test, we start by looking at the “Plasma_A” sample of both patients, which is a blood sample collected before surgery:
test_res %>%
dplyr::filter(Sample %in% c("Plasma_A")) %>%
dplyr::select(FileName, Sample, p_val, mutation_detected)
#> # A tibble: 2 × 4
#> FileName Sample p_val mutation_detected
#> <chr> <chr> <dbl> <lgl>
#> 1 patient_1_plasma_KRAS_G12D.csv Plasma_A 0 TRUE
#> 2 patient_2_plasma_KRAS_G12D.csv Plasma_A 0 TRUE
We see that these samples indicate that the mutation, and thus the tumor, is present in both patients, as expected. From the result we can also see the following estimates of properties of the ddPCR analysis and the ctDNA in the blood:
test_res %>%
dplyr::filter(Sample %in% c("Plasma_A")) %>%
dplyr::select(
FileName, Sample,
mutant_molecules_per_droplet,
wildtype_molecules_per_droplet,
total_mutant_molecules,
mutant_molecules_per_droplet_CI_lower, mutant_molecules_per_droplet_CI_upper,
wildtype_molecules_per_droplet_CI_lower, wildtype_molecules_per_droplet_CI_upper
)
#> # A tibble: 2 × 9
#> FileName Sample mutant_molecules_… wildtype_molecules… total_mutant_mol…
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 patient_1_pl… Plasma… 0.000242 0.224 21.3
#> 2 patient_2_pl… Plasma… 0.00386 0.0960 376.
#> # … with 4 more variables: mutant_molecules_per_droplet_CI_lower <dbl>,
#> # mutant_molecules_per_droplet_CI_upper <dbl>,
#> # wildtype_molecules_per_droplet_CI_lower <dbl>,
#> # wildtype_molecules_per_droplet_CI_upper <dbl>
test_res %>%
dplyr::filter(Sample %in% c("Plasma_A")) %>%
dplyr::select(
FileName, Sample,
allele_frequency,
allele_frequency_CI_lower, allele_frequency_CI_upper
)
#> # A tibble: 2 × 5
#> FileName Sample allele_frequency allele_frequency_… allele_frequency_…
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 patient_1_plas… Plasma… 0.00108 0.000576 0.00181
#> 2 patient_2_plas… Plasma… 0.0387 0.0339 0.0438
We can then look what happens after the surgery. To do this we look at the blood samples marked as “Plasma_B”:
test_res %>%
dplyr::filter(Sample %in% c("Plasma_B")) %>%
dplyr::select(FileName, Sample, p_val, mutation_detected)
#> # A tibble: 2 × 4
#> FileName Sample p_val mutation_detected
#> <chr> <chr> <dbl> <lgl>
#> 1 patient_1_plasma_KRAS_G12D.csv Plasma_B 0.306 FALSE
#> 2 patient_2_plasma_KRAS_G12D.csv Plasma_B 0.242 FALSE
We now see that the test indicates that the mutation is no longer present in any of the patients, which is consistent with the tumor being removed by surgery.
The full list of information available in the analysis results is:
#> [1] "FileName"
#> [2] "Sample"
#> [3] "Well"
#> [4] "Ch1TargetType"
#> [5] "Ch2TargetType"
#> [6] "Target"
#> [7] "DoubleNegativeDroplets"
#> [8] "WildtypeOnlyDroplets"
#> [9] "MutantOnlyDroplets"
#> [10] "DoublePositiveDroplets"
#> [11] "TotalDroplets"
#> [12] "NumberOfMergedWells"
#> [13] "MergedWells"
#> [14] "mutant_molecules_per_droplet"
#> [15] "wildtype_molecules_per_droplet"
#> [16] "p_val"
#> [17] "test_statistic"
#> [18] "mutation_detected"
#> [19] "allele_frequency"
#> [20] "total_mutant_molecules"
#> [21] "total_wildtype_molecules"
#> [22] "mutant_molecules_per_droplet_CI_lower"
#> [23] "mutant_molecules_per_droplet_CI_upper"
#> [24] "allele_frequency_CI_lower"
#> [25] "allele_frequency_CI_upper"
#> [26] "total_mutant_molecules_CI_lower"
#> [27] "total_mutant_molecules_CI_upper"
#> [28] "wildtype_molecules_per_droplet_CI_lower"
#> [29] "wildtype_molecules_per_droplet_CI_upper"
The results can be exported to .csv
format via the function write_csv(x = test_res, file = "your/file/path)
.
In the package the function simulate_droplet_counts
for simulating droplet counts from the statistical model that CASTLE is based on is also provided. Here we will simulate a positive and negative sample based on the parameters learned above, and test them using test_tumor_sample_integrated
:
set.seed(42)
# DNA concentration (molecules per droplet)
l <- 0.5 # WT
r_positive <- 0.01 # M
r_negative <- 0 # M
# Number of droplets
n_drops <- 14000
# Positive sample
test_sample_positive <-
simulate_droplet_counts(
l = l, r = r_positive,
a = trained_model$a_est, b = trained_model$b_est, c = trained_model$c_est,
n_drops = n_drops
)
# Negative sample
test_sample_negative <-
simulate_droplet_counts(
l = l, r = r_negative,
a = trained_model$a_est, b = trained_model$b_est, c = trained_model$c_est,
n_drops = n_drops
)
simulated_samples <- bind_rows(
data.frame(Sample = "Positive", test_sample_positive),
data.frame(Sample = "Negative", test_sample_negative)
)
# Test samples
sim_test_res <- test_tumor_sample_simple(
test_samples = simulated_samples,
simple_model = trained_model
)
# Print some relevant results
sim_test_res %>% select(
Sample, p_val, mutation_detected,
total_mutant_molecules, total_mutant_molecules_CI_lower, total_mutant_molecules_CI_upper
)
#> Sample p_val mutation_detected total_mutant_molecules
#> 1 Positive 0.0000000 TRUE 144.3065735
#> 2 Negative 0.3989545 FALSE 0.9685377
#> total_mutant_molecules_CI_lower total_mutant_molecules_CI_upper
#> 1 115.4188 177.638344
#> 2 0.0000 7.088729