Skip to contents

The bark package implements estimation for a Bayesian nonparametric regression model represented as a sum of multivariate Gaussian kernels as a flexible model to capture nonlinearities, interactions and feature selection.

Installation

You can install the released version of bark from CRAN with:

And the development version from GitHub with:

require("devtools")
devtools::install_github("merliseclyde/bark")

(verify that the branch has a passing R CMD check badge above)

Example

library(bark)
set.seed(42)
traindata <- sim_Friedman2(200, sd=125)
testdata <- sim_Friedman2(1000, sd=0)
fit.bark.d <- bark(y ~ .,  
                   data=data.frame(traindata), 
                   testdata = data.frame(testdata),
                   classification=FALSE, 
                   selection = TRUE,
                   common_lambdas = FALSE,
                   printevery = 10^10)

mean((fit.bark.d$yhat.test.mean-testdata$y)^2)
#> [1] 1920.283

bark is similar to SVM, however it allows different kernel smoothing parameters for every dimension of the inputs xx as well as selection of inputs by allowing the kernel smoothing parameters to be zero.

The plot below shows posterior draws of the λ\lambda for the simulated data.

boxplot(as.data.frame(fit.bark.d$theta.lambda))

The posterior distribution for λ1\lambda_1 and λ4\lambda_4 are concentrated near zero, which leads to x1x_1 and x2x_2 dropping from the mean function.

Roadmap for Future Enhancements

Over the next year the following enhancements are planned:

  • port more of the R code to C/C++ for improvements in speed

  • add S3 methods for predict, summary, plot

  • add additional kernels and LARK methods from AOS (2011) paper

  • better hyperparameter specification

If there are features you would like to see added, please feel free to create an issue in GitHub and we can discuss!