A new method to model the distribution of extremely rare species

Post provided by Pasquale Raia (he/him), Alessandro Mondanaro (he/him), Mirko Di Febbraro (he/him), Marina Melchionna (she/her) and Silvia Castiglione (she/her)

Back in 2001 Sally Duncan, a quite prolific science writer, published on Science Findings, one of the Oregon-based Pacific Northwest Research Station’s public resources, an exquisitely assembled report focusing on a fundamental issue in ecology, the definition, perception and meaning of ecological rarity. To ecologists and evolutionary biologists, rarity is the classic I-know-what-it-is-until-somebody-asks-me-to-define thing Duncan was illustrating David Boughton’s finding that rare species (written simplistically, those you seldom encounter during surveys) are rare because of how accidents of history (in the US West Coast forests this happens to be mostly major fires) intersect with their ability to disperse and how much habitat the fire outbreaks spare. Boughton (and Duncan echoing him) moved from these findings to the dream of individuating multispecies management solutions, which could in principle help to (in their words) ‘manage species you know nothing about’.

According to IUCN guidelines, effective management must go through correct risk assessment. For instance, IUCN considers a species as vulnerable ‘if bioclimate models (see section 12.1.12) predict that a range reduction could correspond to a population reduction of 80% or more’. However, bioclimate models hardly work with rare species, originating what Angela Lomba and her colleagues once – and with inviably crafty rhetoric – defined the ‘rare species modeling paradox’. The paradox derives from a typical characteristic of all bioclimate models: no matter how you implement them, with few starting field observations there will always be inaccurate, hard-to-trust model, and severely overfit prediction. Lomba and colleagues’ solution to the paradox was to fit several ‘smaller’ models including only pairs of environmental variables, and then averaging their predictions within a weighted ensemble model. This ensemble of small models (ESMs) approach proved viable and effective, and yet it does not address the problem of the weak starting information about the species preferences, neither it is known whether it is safe to apply ESMs in modeling the rarest species of them all.

Supplementing bioclimatic modeling with phylogenetic history

Our background in phylogenetic modelling inspired us with a different (from ESMs) solution. Our idea was to complement direct observation on species climatic preference (drawn from occurrence records) with phylogenetic information. We thought that since niche traits such as thermal tolerance limits and body size tend to be phylogenetically inherited, we could use phylogeny as a predictor in bioclimatic modeling. To develop on this idea, we combined ecological niche factor analysis (ENFA) with phylogenetic imputation. The method, named ENphylo, works by assembling a phylogenetic tree inclusive of the rare species and their not so rare relatives, calculating niche marginality and specialization factors for the latter (as routinely implemented under ENFA) and then relying on phylogenetic imputation to derive marginality and specialization for the poorly-sampled species. The few occurrence data for the latter are then used to convert the imputed marginality and specialization axes into habitat suitability maps.

Reindeer Rangifer tarandus are currently categorised as vunerable by the IUCN .

Making the best out of a manifold of random observations

In assessing ENphylo performance, we faced the problem of what an appropriate standard could have been. We dwelt on a recent study we published, where we assessed the relationship between habitat fragmentation and extinction in 31, thoroughly sampled and common (let’s assume we all know what ‘common’ is abandoning the onus to define) late Pleistocene large mammals from Western Eurasia. Since we used species distribution models (SDMs) there, by deploying MaxEnt on their fossil records, we had statistical benchmarks of SDM performance, obtained by using the most complete and well-dated record assembled for each species, to the best of our capacities. We then randomly draw as little as 10 datapoints from the original fossil records, and performed both ENphylo and ESMs. We found that ENphylo performs better than ENFA, on the same sets of 10 randomly drawn datapoints, 27 times out of 31, and better than ESM 29/31. Even more intriguingly, ~ 22 species (depending on the evaluation metric applied) reached satisfactory performance under ENphylo, that is the same number as with our original study, where the full record and the mighty MaxEnt were used, though.

As one reviewer pointed out, “[ENphylo provides] a nice contribution in a topic that is always a headache. Rare species (or those with few occurrences regardless their rareness) are always a modeling challenge”. With ENphylo, we offer a new tool to help predicting the future distribution of rare species in the light of climate change, and to delineate how rare extinct species reacted to past climatic variation. Whether or not this fulfills the dream of finding multispecies management solutions, we hope ENphylo will help protecting some from extinction.