Overview

This Rmarkdown documents contains the code and 2D simulations of the error propagation and classification uncertainty estimation using Monte Carlo simulations for two generic classes with polynomial separation classified with Polynomial SVM.

Packages

windowsFonts(A = windowsFont("DejaVu Sans")) 
palette("Tableau10")

library(caret) #classification using train()
library(kernlab)    
library(MASS) #generate the samples and calculate the augmented noise
library(dplyr) #data handling 
library(knitr) #knit options for the Rmarkdown report
library(kableExtra) #table options

Local sources files containing the functions

source("predict_with_uncertainty_parallel_v4.R")
source("plot_functions.R")

2D simulations with linear separation pattern

set.seed(123)
n_points <- 10
n_replicates <- 5
noise_sd <- 0.1
rho <- 0

cov_matrix <- matrix(c(noise_sd^2, rho * noise_sd^2,
                       rho * noise_sd^2, noise_sd^2), nrow = 2)

# Define curve generator with vertical offset
curve_points <- function(n_points, curvature = 2, offset = 0, x_range = c(-0.8, 0.8)) {
  x_vals <- seq(x_range[1], x_range[2], length.out = n_points)
  y_vals <- (x_vals)^curvature + offset
  return(data.frame(x = x_vals, y = y_vals))
}

# Replicated point generation
generate_class_curve <- function(n_points, n_replicates, centers, cov_matrix) {
  points <- matrix(0, nrow = n_points * n_replicates, ncol = 2)
  
  for (i in 1:n_points) {
    base_point <- c(centers$x[i] + rnorm(1, 0, 0.1), centers$y[i] + rnorm(1, 0, 0.1))
    
    for (j in 1:n_replicates) {
      noise <- mvrnorm(1, mu = c(0, 0), Sigma = cov_matrix)
      points[(i - 1) * n_replicates + j, ] <- base_point + noise
    }
  }
  return(points)
}

# Class 1 and Class 2 on same curve, offset vertically
curve1 <- curve_points(n_points, curvature = 2, offset = 0)
curve2 <- curve_points(n_points, curvature = 2, offset = 0.5)  # class 2 is shifted up

class1 <- generate_class_curve(n_points, n_replicates, curve1, cov_matrix)
class2 <- generate_class_curve(n_points, n_replicates, curve2, cov_matrix)

generate_class_curve <- function(n_points, n_replicates, centers, cov_matrix) {
  points <- matrix(0, nrow = n_points * n_replicates, ncol = 2)
  
  for (i in 1:n_points) {
    base_point <- c(centers$x[i] + rnorm(1, 0, 0.1), centers$y[i] + rnorm(1, 0, 0.1))
    
    for (j in 1:n_replicates) {
      noise <- mvrnorm(1, mu = c(0, 0), Sigma = cov_matrix)
      points[(i - 1) * n_replicates + j, ] <- base_point + noise
    }
  }
  return(points)
}

# Generate classes
class1 <- generate_class_curve(n_points, n_replicates, curve1, cov_matrix)
class2 <- generate_class_curve(n_points, n_replicates, curve2, cov_matrix)

data <- data.frame(
  Class = factor(rep(c("Class 1", "Class 2"), each = n_points * n_replicates)),
  Sample_ID = rep(1:(2 * n_points), each = n_replicates),
  X1 = c(class1[, 1], class2[, 1]),
  X2 = c(class1[, 2], class2[, 2])
)

# Mean center the data
data$X1 <- data$X1 - mean(data$X1)
data$X2 <- data$X2 - mean(data$X2)

# Calculate the mean and standard deviation of each sample
mean_data <- aggregate(cbind(X1, X2) ~ Class + Sample_ID, data = data, FUN = mean)
sd_data <- aggregate(cbind(X1, X2) ~ Class + Sample_ID, data = data, FUN = sd)
rm(class1, class2)

Fitting the PLS-DA model

svm_model <- train(y = mean_data$Class,
                   x = mean_data[,3:4], 
                   method = "svmPoly")
svm_model
## Support Vector Machines with Polynomial Kernel 
## 
## 20 samples
##  2 predictor
##  2 classes: 'Class 1', 'Class 2' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 20, 20, 20, 20, 20, 20, ... 
## Resampling results across tuning parameters:
## 
##   degree  scale  C     Accuracy   Kappa     
##   1       0.001  0.25  0.4413968  0.09962096
##   1       0.001  0.50  0.4413968  0.09962096
##   1       0.001  1.00  0.4413968  0.09962096
##   1       0.010  0.25  0.4413968  0.09962096
##   1       0.010  0.50  0.4413968  0.09962096
##   1       0.010  1.00  0.4413968  0.09962096
##   1       0.100  0.25  0.4515556  0.11578257
##   1       0.100  0.50  0.5377937  0.23299353
##   1       0.100  1.00  0.6264762  0.32890026
##   2       0.001  0.25  0.4413968  0.09962096
##   2       0.001  0.50  0.4413968  0.09962096
##   2       0.001  1.00  0.4413968  0.09962096
##   2       0.010  0.25  0.4413968  0.09962096
##   2       0.010  0.50  0.4413968  0.09962096
##   2       0.010  1.00  0.4413968  0.09962096
##   2       0.100  0.25  0.5377937  0.23299353
##   2       0.100  0.50  0.6368413  0.34194193
##   2       0.100  1.00  0.7106667  0.46895787
##   3       0.001  0.25  0.4413968  0.09962096
##   3       0.001  0.50  0.4413968  0.09962096
##   3       0.001  1.00  0.4413968  0.09962096
##   3       0.010  0.25  0.4413968  0.09962096
##   3       0.010  0.50  0.4413968  0.09962096
##   3       0.010  1.00  0.4610000  0.11983843
##   3       0.100  0.25  0.5613651  0.23825407
##   3       0.100  0.50  0.6676349  0.38951448
##   3       0.100  1.00  0.7264603  0.50677692
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were degree = 3, scale = 0.1 and C = 1.

Local uncertainty

The predict_with_uncertainty_parallel() function is used to estimate the uncertainty of each sample.

loc_pred <- predict_with_uncertainty(data[,-1],
                                                svm_model,
                                                n_sim = 500,
                                                return_simulations = T)

This function returns the mean and standard deviation of the \(\hat{y}\) of each sample based on the Monte Carlo simulation. It also returns the area above and bellow the decision boundary (in this case, zero) of each sample. The classification output indicates if the sample is being classified with certainty or not. For samples where the distribution crosses the decision boundary (considering the 95% confidence level), the sample will be assigned as uncertain. Once return_simulations = T, the augmented noise will be saved. The results for this 2D simulated data is displayed in Table 1.

Table 1. Results of the predict_with_uncertainty_parallel() function.
class sample y_mean y_sd area_above_0 area_below_0 classification
Class 1 1 0.4193 0.1841 0.989 0.011 Class 1
Class 1 2 0.5893 0.1187 1.000 0.000 Class 1
Class 1 3 0.3278 0.0735 1.000 0.000 Class 1
Class 1 4 1.3338 0.1170 1.000 0.000 Class 1
Class 1 5 0.9997 0.1134 1.000 0.000 Class 1
Class 1 6 1.6352 0.1797 1.000 0.000 Class 1
Class 1 7 1.3401 0.1353 1.000 0.000 Class 1
Class 1 8 0.4900 0.0984 1.000 0.000 Class 1
Class 1 9 0.1572 0.0957 0.950 0.050 Uncertain
Class 1 10 -0.1727 0.1301 0.092 0.908 Uncertain
Class 2 1 -1.8278 0.1588 0.000 1.000 Class 2
Class 2 2 -0.9948 0.1009 0.000 1.000 Class 2
Class 2 3 -0.8295 0.1174 0.000 1.000 Class 2
Class 2 4 -0.2670 0.1090 0.007 0.993 Class 2
Class 2 5 -0.1796 0.0493 0.000 1.000 Class 2
Class 2 6 -0.3763 0.1620 0.010 0.990 Class 2
Class 2 7 0.1155 0.1615 0.763 0.237 Uncertain
Class 2 8 -0.7400 0.1782 0.000 1.000 Class 2
Class 2 9 -0.9937 0.0957 0.000 1.000 Class 2
Class 2 10 -2.1843 0.1791 0.000 1.000 Class 2

These results can be visualized, as showed in Figure 1.

Figure 1. Uncertainty of the classification outputs for the Polynomial SVM model. (a) 2D data simulation. Circles represents the replicates, while the triangles represents the mean of replicates. (b) Augmented noise generated through eigen decomposition of the original covariance matrix for each sample. 50 points are showed for each sample. (c) Estimated uncertainty for each sample.

Figure 1. Uncertainty of the classification outputs for the Polynomial SVM model. (a) 2D data simulation. Circles represents the replicates, while the triangles represents the mean of replicates. (b) Augmented noise generated through eigen decomposition of the original covariance matrix for each sample. 50 points are showed for each sample. (c) Estimated uncertainty for each sample.

Correlated noise

The previous examples showed how the uncertainty was estimated for Independent and Identically Distributed noise (iid noise). The next examples show how the uncertainty can be estimated for correlated noise, easily found in spectroscopy data.

Positive covariance

For this example, the covariance between the X1 and X2 is 0.8. The results are showed in Table 2.

svm_model
## Support Vector Machines with Polynomial Kernel 
## 
## 20 samples
##  2 predictor
##  2 classes: 'Class 1', 'Class 2' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 20, 20, 20, 20, 20, 20, ... 
## Resampling results across tuning parameters:
## 
##   degree  scale  C     Accuracy   Kappa     
##   1       0.001  0.25  0.4619683  0.07466667
##   1       0.001  0.50  0.4619683  0.07466667
##   1       0.001  1.00  0.4619683  0.07466667
##   1       0.010  0.25  0.4619683  0.07466667
##   1       0.010  0.50  0.4619683  0.07466667
##   1       0.010  1.00  0.4619683  0.07466667
##   1       0.100  0.25  0.4719683  0.09466667
##   1       0.100  0.50  0.5224603  0.17461660
##   1       0.100  1.00  0.6516508  0.35595756
##   2       0.001  0.25  0.4619683  0.07466667
##   2       0.001  0.50  0.4619683  0.07466667
##   2       0.001  1.00  0.4619683  0.07466667
##   2       0.010  0.25  0.4619683  0.07466667
##   2       0.010  0.50  0.4619683  0.07466667
##   2       0.010  1.00  0.4619683  0.07466667
##   2       0.100  0.25  0.5274603  0.18461660
##   2       0.100  0.50  0.6766508  0.40856260
##   2       0.100  1.00  0.7189206  0.46755516
##   3       0.001  0.25  0.4619683  0.07466667
##   3       0.001  0.50  0.4619683  0.07466667
##   3       0.001  1.00  0.4619683  0.07466667
##   3       0.010  0.25  0.4619683  0.07466667
##   3       0.010  0.50  0.4619683  0.07466667
##   3       0.010  1.00  0.4997460  0.14666667
##   3       0.100  0.25  0.6505397  0.36905145
##   3       0.100  0.50  0.7126984  0.45989647
##   3       0.100  1.00  0.7493492  0.51538989
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were degree = 3, scale = 0.1 and C = 1.
Table 2. Results of the predict_with_uncertainty_parallel() function for variables with positive covariance.
class sample y_mean y_sd area_above_0 area_below_0 classification
Class 1 1 -0.4576 0.1552 0.002 0.998 Class 2
Class 1 2 0.0948 0.1391 0.752 0.248 Uncertain
Class 1 3 0.8421 0.1431 1.000 0.000 Class 1
Class 1 4 1.0011 0.0610 1.000 0.000 Class 1
Class 1 5 1.2555 0.2599 1.000 0.000 Class 1
Class 1 6 1.4838 0.1029 1.000 0.000 Class 1
Class 1 7 1.1824 0.0792 1.000 0.000 Class 1
Class 1 8 0.9995 0.0620 1.000 0.000 Class 1
Class 1 9 0.0449 0.1084 0.661 0.339 Uncertain
Class 1 10 0.1478 0.0895 0.951 0.049 Uncertain
Class 2 1 -1.9972 0.0882 0.000 1.000 Class 2
Class 2 2 -0.7959 0.1516 0.000 1.000 Class 2
Class 2 3 -1.0038 0.1235 0.000 1.000 Class 2
Class 2 4 -0.4538 0.1104 0.000 1.000 Class 2
Class 2 5 -0.0594 0.1410 0.337 0.663 Uncertain
Class 2 6 -0.2513 0.1519 0.049 0.951 Uncertain
Class 2 7 -0.1501 0.1211 0.108 0.892 Uncertain
Class 2 8 -1.0395 0.0459 0.000 1.000 Class 2
Class 2 9 -0.5021 0.1163 0.000 1.000 Class 2
Class 2 10 -1.1833 0.0473 0.000 1.000 Class 2

The results can also be visualized, as showed in Figure 2.

Figure 3. Uncertainty of the classification outputs for the Polynomial SVM model with positive covariance. (a) 2D data simulation. Circles represents the replicates, while the triangles represents the mean of replicates. (b) Augmented noise generated through eigen decomposition of the original covariance matrix for each sample. 50 points are showed for each sample. (c) Estimated uncertainty for each sample.

Figure 3. Uncertainty of the classification outputs for the Polynomial SVM model with positive covariance. (a) 2D data simulation. Circles represents the replicates, while the triangles represents the mean of replicates. (b) Augmented noise generated through eigen decomposition of the original covariance matrix for each sample. 50 points are showed for each sample. (c) Estimated uncertainty for each sample.

Negative covariance

For this example, the covariance between the X1 and X2 is -0.8. The results are showed in Table 3.

.

svm_model
## Support Vector Machines with Polynomial Kernel 
## 
## 20 samples
##  2 predictor
##  2 classes: 'Class 1', 'Class 2' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 20, 20, 20, 20, 20, 20, ... 
## Resampling results across tuning parameters:
## 
##   degree  scale  C     Accuracy   Kappa     
##   1       0.001  0.25  0.4459683  0.04609524
##   1       0.001  0.50  0.4459683  0.04609524
##   1       0.001  1.00  0.4459683  0.04609524
##   1       0.010  0.25  0.4459683  0.04609524
##   1       0.010  0.50  0.4459683  0.04609524
##   1       0.010  1.00  0.4459683  0.04609524
##   1       0.100  0.25  0.4609683  0.07609524
##   1       0.100  0.50  0.5514603  0.21315139
##   1       0.100  1.00  0.6827619  0.43117497
##   2       0.001  0.25  0.4459683  0.04609524
##   2       0.001  0.50  0.4459683  0.04609524
##   2       0.001  1.00  0.4459683  0.04609524
##   2       0.010  0.25  0.4459683  0.04609524
##   2       0.010  0.50  0.4459683  0.04609524
##   2       0.010  1.00  0.4459683  0.04609524
##   2       0.100  0.25  0.5401746  0.20144733
##   2       0.100  0.50  0.7045397  0.46608406
##   2       0.100  1.00  0.7526825  0.52205314
##   3       0.001  0.25  0.4459683  0.04609524
##   3       0.001  0.50  0.4459683  0.04609524
##   3       0.001  1.00  0.4459683  0.04609524
##   3       0.010  0.25  0.4459683  0.04609524
##   3       0.010  0.50  0.4459683  0.04609524
##   3       0.010  1.00  0.4843016  0.11209524
##   3       0.100  0.25  0.6879841  0.42197941
##   3       0.100  0.50  0.7603492  0.54782089
##   3       0.100  1.00  0.7897778  0.59433759
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were degree = 3, scale = 0.1 and C = 1.
.
Table 3. Results of the predict_with_uncertainty_parallel() function for variables with negative covariance.
class sample y_mean y_sd area_above_0 area_below_0 classification
Class 1 1 -0.0960 0.0613 0.059 0.941 Uncertain
Class 1 2 0.0953 0.0955 0.841 0.159 Uncertain
Class 1 3 0.9570 0.1324 1.000 0.000 Class 1
Class 1 4 1.0016 0.0494 1.000 0.000 Class 1
Class 1 5 1.2855 0.1246 1.000 0.000 Class 1
Class 1 6 1.5958 0.1295 1.000 0.000 Class 1
Class 1 7 1.2724 0.0542 1.000 0.000 Class 1
Class 1 8 1.0078 0.1353 1.000 0.000 Class 1
Class 1 9 0.1170 0.0826 0.922 0.078 Uncertain
Class 1 10 0.5195 0.3136 0.951 0.049 Uncertain
Class 2 1 -1.8383 0.0757 0.000 1.000 Class 2
Class 2 2 -0.8705 0.0673 0.000 1.000 Class 2
Class 2 3 -0.9900 0.1684 0.000 1.000 Class 2
Class 2 4 -0.5541 0.1148 0.000 1.000 Class 2
Class 2 5 -0.1993 0.1113 0.037 0.963 Uncertain
Class 2 6 -0.2937 0.1384 0.017 0.983 Class 2
Class 2 7 -0.1084 0.1580 0.246 0.754 Uncertain
Class 2 8 -1.1105 0.0848 0.000 1.000 Class 2
Class 2 9 -0.3935 0.1391 0.002 0.998 Class 2
Class 2 10 -1.3505 0.1472 0.000 1.000 Class 2

The results can also be visualized, as showed in Figure 3.

Figure 3. Uncertainty of the classification outputs for the Polynomial SVM model with negative covariance. (a) 2D data simulation. Circles represents the replicates, while the triangles represents the mean of replicates. (b) Augmented noise generated through eigen decomposition of the original covariance matrix for each sample. 50 points are showed for each sample. (c) Estimated uncertainty for each sample.

Figure 3. Uncertainty of the classification outputs for the Polynomial SVM model with negative covariance. (a) 2D data simulation. Circles represents the replicates, while the triangles represents the mean of replicates. (b) Augmented noise generated through eigen decomposition of the original covariance matrix for each sample. 50 points are showed for each sample. (c) Estimated uncertainty for each sample.