Open Research Newcastle
Browse

Application of smooth tests of goodness of fit to generalized linear models

thesis
posted on 2025-05-11, 07:41 authored by Paul Rippon
Statistical models are an essential part of data analysis across many diverse fields. They are used to test research hypotheses, aid decision making, estimate effect sizes and/or improve understanding of the underlying processes generating the data of interest. However it is essential to critically assess any fitted model, confirming that the model really is compatible with the data, before meaningful conclusions are possible. Generalized linear models (GLMs) provide a flexible modelling framework encompassing many commonly used models including the normal linear model, logistic regression model and Poisson regression model. This thesis explores how the smooth testing concept - originally proposed by Neyman (1937) and further developed by Rayner et al. (2009) among others - can be used to test the distributional assumption in a GLM. However sensible interpretation of this test, or any other test used to assess the fit of a GLM, must recognize that: * the stochastic, deterministic and link components that make up a GLM should all be considered when assessing model validity, * the validity of any one of these three components cannot be sensibly considered in isolation as it is confounded by the validity of the other two. It is therefore important to consider how the smooth test developed in this thesis might be usefully incorporated into an overall model development strategy for GLMs, either replacing or supplementing existing diagnostic tools. Simulation studies demonstrate that the power of the smooth test is competitive with other existing tests. However, it also offers the possibility of improved diagnostic ability through the breakdown of the overall smooth test statistic into a sum of squares of interpretable components. The SmoothGLM package has been developed which implements the smooth test in a form that can be easily applied to models fitted using the standard glm() function within the R statistical computing environment.

History

Year awarded

2013.0

Thesis category

  • Doctoral Degree

Degree

Doctor of Philosophy (PhD)

Supervisors

Rayner, J. C. W. (University of Newcastle); Tuyl, Frank (University of Newcastle)

Language

  • en, English

College/Research Centre

Faculty of Science and Information Technology

School

School of Mathematical and Physical Sciences

Rights statement

Copyright 2013 Paul Rippon

Usage metrics

    Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC