Confusion Matrices through k Nearest Neighbours Classification

Computes confusion matrices (one for each value of \(k\)) using \(k\)-NN classification from the results of two parametric bootstraps, one of these being labelled a holdout set and tested against the other one.

kNN.confusionmatrix(
  df,
  df.holdout,
  k,
  ties = "model2",
  print_genargs = TRUE,
  verbose = TRUE
)

Arguments

df	Data frame output by `pbcm.di` or `pbcm.du`
df.holdout	Data frame output by `pbcm.di` or `pbcm.du`
k	Number of neighbours to consider in k-NN classification; may be a vector of integers
ties	Which way to break ties in k-NN classification (see `kNN.classification`)
print_genargs	Should the generator arguments of the holdout distribution be included in the output? (See Details)
verbose	If `TRUE`, prints a progress bar and issues warnings

Value

A data frame with the following columns:

k

Number of nearest neighbours

P

Number of positives

N

Number of negatives

TP

Number of true positives

FP

Number of false positives

TN

Number of true negatives

FN

Number of false negatives

alpha

Type I error (false positive) rate; equal to FP divided by N

beta

Type II error (false negative) rate; equal to FN divided by P

In addition to these columns, if print_genargs == TRUE, each argument that was passed via genargs1 and genargs2 to pbcm.di or pbcm.du to generate df.holdout is included as a column of its own.

Details

The function takes each DeltaGoF value from df.holdout, compares it against the DeltaGoF distributions in df, and decides based on \(k\)-NN classification. By convention, we take model 2 as the null hypothesis and model 1 as the alternative. Hence a false positive, for instance, means the situation where model 2 generated the data but the decision was in favour of model 1.

Examples

x <- seq(from=0, to=1, length.out=100)
mockdata <- data.frame(x=x, y=x + rnorm(100, 0, 0.5))

myfitfun <- function(data, p) {
  res <- nls(y~a*x^p, data, start=list(a=1.1))
  list(a=coef(res), GoF=deviance(res))
}

mygenfun <- function(model, p) {
  x <- seq(from=0, to=1, length.out=100)
  y <- model$a*x^p + rnorm(100, 0, 0.5)
  data.frame(x=x, y=y)
}

pb1 <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun,
        genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2),
        genargs1=list(p=1), genargs2=list(p=2))
#> Initializing output data frame...
#> Bootstrapping...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |======================================================================| 100%

pb2 <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun,
        genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2),
        genargs1=list(p=1), genargs2=list(p=2))
#> Initializing output data frame...
#> Bootstrapping...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |======================================================================| 100%

kNN.confusionmatrix(df=pb1, df.holdout=pb2, k=1:10)
#>    genargs1_p genargs2_p  k  P  N TP FP TN FN alpha beta
#> 1           1          2  1 20 20 14  4 16  6  0.20  0.3
#> 2           1          2  2 20 20 14  4 16  6  0.20  0.3
#> 3           1          2  3 20 20 12  3 17  8  0.15  0.4
#> 4           1          2  4 20 20 12  3 17  8  0.15  0.4
#> 5           1          2  5 20 20 14  4 16  6  0.20  0.3
#> 6           1          2  6 20 20 14  4 16  6  0.20  0.3
#> 7           1          2  7 20 20 14  4 16  6  0.20  0.3
#> 8           1          2  8 20 20 14  4 16  6  0.20  0.3
#> 9           1          2  9 20 20 14  4 16  6  0.20  0.3
#> 10          1          2 10 20 20 14  4 16  6  0.20  0.3

Confusion Matrices through k Nearest Neighbours Classification

Arguments

Value

Details

See also

Examples