Probability models and decision analysis Model checking Probability models and decision analysis Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
Purpose of model checking Think of digital video camera Input : reflected light from real world objects Output : digital image Is the image a good representation of the real world? Depends on the purpose Just want to identify objects (”chair”,”dog”,”mountain”) from the image? Low resolution in black & white Want to know the colours and dimensions of the objects? High resolution with colours, stereo image Model checking: does the model make ”good” copies of data? Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
Process of model checking Learn model parameters from data Generate new data from the model using the model parameters: produce a ”copy”. Often called ”replicated data” Compare the replicated and observed data Evaluate the ”goodness” Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
Bayesian p-values A simple way to check the goodness of fit Bayesian p-value = P(replicate > observed | observed data) P-value for each observation: check the model’s ability to repdroduce the Mean Variance Lots of variations to check specific details about the model Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
Bayesian p-values sd mean x[2] x[1] x.rep[1] x.rep[2] P=0.4 P=0.6 0.25 0.25 0.2 0.2 0.15 0.15 sd mean 0.1 0.1 0.05 0.05 5 10 15 20 25 5 10 15 20 25 x[2] x[1] 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 5 10 15 20 25 5 10 15 20 25 x.rep[1] x.rep[2] 0.25 0.25 0.2 P=0.4 P=0.6 0.2 0.15 0.15 0.1 0.1 0.05 0.05 5 10 15 20 25 5 10 15 20 25 Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018 15.9.2018 5
Bayesian p-values: interpretation If the mean provided by the model is ”perfect” -> Mean of p-values = 0.5 If the variance provided by the model is ”perfect” -> SD of p-values = 0.288 When number of data points is small, there is some deviation from these numbers even with a perfect fit Graphical interpretation may be more useful Rank the p-values and plot them in increasing order Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
Mean : too high Mean : too low Mean : Ok Variance: too high Variance: Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
Linear regression in BUGS: DAG α β Replicated y P-value i=1:n μ[i] Observed y x[i] y.r[i] y[i] σ pv[i]
BUGS code model{ for(i in 1:n){ y[i]~dnorm(mu[i],tau) P-value y.rep[i]~dnorm(mu[i],tau) pval[i]<-step(y.rep[i]-y[i]) mu[i]<-alpha+beta*x[i] } alpha~dnorm(?,?) beta~dnorm(?,?) sigma~dnorm(?,?)I(0,) tau<-1/pow(sigma,2) P-value Observed y Replicated y Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
Bayesian p-values in regression models Plotting the ranked p-values gives a quick overview But may not reveal problems with the regression line Solution: plot p-values against the explanatory variable Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
Correlation = bad fit Bad fit Not much different! Good fit No correlation = good fit Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018
To remember about model checking… A qualitative, informal reality check ”Bad” fit Model is not very good copying machine Theory behind the model may not be very consistent with the data But the size of data and model complexity play a role Small data & complex model = model checking does not work very well! ”Good fit” Model is a good copying machine Theory behind the model can be consistent with the data, but may not make any sense otherwise! Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi 15.9.2018