Open Research Newcastle
Browse

Combining multiple data sources in species distribution models while accounting for spatial dependence and overfitting with combined penalized likelihood maximization

Download (2.09 MB)
journal contribution
posted on 2025-05-11, 19:26 authored by Ian RennerIan Renner, Julie Louvrier, Olivier Gimenez
1. The increase in availability of species datasets means that approaches to species distribution modelling that incorporate multiple datasets are in greater demand. Recent methodological developments in this area have led to combined likelihood approaches, in which a log‐likelihood comprised of the sum of the log‐likelihood components of each data source is maximized. Often, these approaches make use of at least one presence‐only dataset and use the log‐likelihood of an inhomogeneous Poisson point process model in the combined likelihood construction. While these advancements have been shown to improve predictive performance, they do not currently address challenges in presence‐only modelling such as checking and correcting for violations of the independence assumption of a Poisson point process model or more general challenges in species distribution modelling such as overfitting. 2.In this paper, we present an extension of the combined likelihood framework which accommodates alternative presence‐only likelihoods in the presence of spatial dependence as well as lasso‐type penalties to account for potential overfitting. We compare the proposed combined penalized likelihood approach to the standard combined likelihood approach via simulation and apply the method to modelling the distribution of the Eurasian lynx in the Jura Mountains in eastern France.3.The simulations show that the proposed combined penalized likelihood approach has better predictive performance than the standard approach when spatial dependence is present in the data. The lynx analysis shows that the predicted maps vary significantly between the model fitted with the proposed combined penalized approach accounting for spatial dependence and the model fitted with the standard combined likelihood. 4.This work highlights the benefits of careful consideration of the presence‐only components of the combined likelihood formulation, and allows greater flexibility and ability to accommodate real datasets.

History

Journal title

Methods in Ecology & Evolution

Volume

10

Issue

12

Pagination

2118-2128

Publisher

Wiley-Blackwell Publishing

Language

  • en, English

College/Research Centre

Faculty of Science

School

School of Mathematical and Physical Sciences

Rights statement

This is the peer reviewed version of the following article: Renner, IW, Louvrier, J, Gimenez, O. Combining multiple data sources in species distribution models while accounting for spatial dependence and overfitting with combined penalized likelihood maximization. Methods Ecol Evol. 2019; 10: 2118– 2128, which has been published in final form at https://doi.org/10.1111/2041-210X.13297. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited.