Abstract: Background. Range maps are a useful tool to describe the spatial distribution of species. However, they need to be used with caution, as they essentially represent a rough approximation of a species’ suitable habitats. When stacked together, the resulting communities in each grid cell may not always be realistic, especially when species interactions are taken into account. Here we show the extent of the mismatch between range maps, provided by the International Union for Conservation of Nature (IUCN), and species interactions data. More precisely, we show that local networks built from those stacked range maps often yield unrealistic communities, where species of higher trophic levels are completely disconnected from primary producers. Methodology. We used the well-described Serengeti food web of mammals and plants as our case study, and identify areas of data mismatch within predators’ range maps by taking into account food web structure. We then used occurrence data from the Global Biodiversity Information Facility (GBIF) to investigate where data is most lacking. Results. We found that most predator ranges comprised large areas without any overlapping distribution of their preys. However, many of these areas contained GBIF occurrences of the predator. Conclusions. Our results suggest that the mismatch between both data sources could be due either to the lack of information about ecological interactions or the geographical occurrence of preys. We finally discuss general guidelines to help identify defective data among distributions and interactions data, and we recommend this method as a valuable way to assess whether the occurrence data that are being used, even if incomplete, are ecologically accurate.

draft preprint

Introduction

Finding a species in a certain location is like finding an encrypted message that traveled through time. It carries the species’ evolutionary history, migration patterns, as well as any direct and indirect effects generated by other species (some of which we may not even know exist). Ecologists have been trying to decode this message with progressively more powerful tools, from their field notes to highly complex computational algorithms. However, to succeed in this challenge it is important to have the right clues in hand. There are many ways we can be misled by data - or the lack of it: taxonomic errors (e.g., due to updates in the taxonomy of a species), geographic inaccuracy (e.g., approximate coordinates or lack of documentation about their accuracy), or sampling biases (e.g., data clustered near roads or research centers) (Ladle and Hortal 2013; Hortal et al. 2015; Poisot et al. 2021). One way to identify - and potentially fix - these errors is to combine many different pieces of information about the occurrence of a species, so agreements and mismatches can emerge. Although previous studies have combined different types of occurrence data to measure the accuracy of datasets (Hurlbert and Jetz 2007; Hurlbert and White 2005; Ficetola et al. 2014), none have used different types of information so far (i.e., ecological characteristics other than geographical distribution). Here we suggest jointly analysing species occurrence (range maps and point occurrences) and ecological interactions to identify mismatches between datasets and areas of data deficit.

Interactions form complex networks that shape ecological structures and maintain the essential functions of ecosystems, such as seed dispersal, pollination, and biological control (Albrecht 2018; Fricke et al. 2022) that ultimately affect the composition, richness, and successional patterns of communities across biomes. Yet, the connection between occurrence and interaction data is a frequent debate in ecology (Blanchet, Cazelles, and Gravel 2020; Wisz et al. 2013). For instance, macroecological models are often used with point or range occurrence data in order to investigate the dynamics of a species with its environment. However, these models do not account for ecological interactions, although it has been demonstrated that they might largely affect species distribution (Abrego et al. 2021; Afkhami, McIntyre, and Strauss 2014; Araújo, Marcondes-Machado, and Costa 2014; Godsoe et al. 2017; Godsoe and Harmon 2012; Gotelli, Graves, and Rahbek 2010; Wisz et al. 2013). Some researchers argue that occurrence data can also capture real-time interactions (see Roy et al. 2016; Ryan et al. 2018), and, because of that, it would not be necessary to include ecological interaction dynamics in macroecological models. On the other hand, many mechanistic simulation models in ecology have considered the effect of competition and facilitation in range shifts. For example, Gotelli et al. (2010) demonstrate how conspecific attraction might be the main factor driving the distribution of migratory birds; Afkhami et al. (2014) explores how mutualistic fungal endophytes are responsible or expanding the range of native grass; many other examples are discussed in Wisz et al. (2013). Although interactions across trophic levels are demonstrated to determine species range (Wisz et al. 2013), the use of these interactions in mechanistic simulation models in macroecology remains insufficient (as discussed in Cabral, Valente, and Hartig 2017).

A significant challenge in this debate is the quality and quantity of species distribution and ecological data (Boakes et al. 2010; Ronquillo et al. 2020; Meyer, Weigelt, and Kreft 2016) - a gap that can lead to erroneous conclusions in macroecological research (Hortal et al. 2008). Amongst the geographical data available are the range maps provided by the International Union for the Conservation of Nature (IUCN). Such maps consist of simplified polygons, often created as alpha or convex hulls around known species locations, refined by expert knowledge about the species (IUCN Red List Technical Working Group 2019). These maps can be used in macroecological inferences in the lack of more precise information (Fourcade 2016; Alhajeri and Fourcade 2019), but it has been recommended that they are used with caution since they tend to underestimate the distribution of species that are not well-known (Herkt, Skidmore, and Fahr 2017) (especially at fine scale resolutions; Hurlbert and Jetz (2007); Hurlbert and White (2005)), do not represent spatial variation in species occurrence and abundance (Dallas, Pironon, and Santini 2020), and can include inadequate areas within the estimated range. Another source of species distribution information is the Global Biodiversity Information Facility (GBIF), which is an online repository of georeferenced observational records that come from various sources, including community science programs, museum collections, and long-term monitoring schemes. A great source of bias in these datasets is the irregular sampling effort, with more occurrences originating from attractive and accessible areas and observation of charismatic species (Alhajeri and Fourcade 2019). As for ecological data, a complete assessment is difficult and is aggravated by biased sampling methods, data aggregation (Poisot et al. 2020; Hortal et al. 2015) and by the fact that interactions are very often events that occur in a narrow window of time. Nevertheless, we have witnessed an increase in the availability of biodiversity data in the last decades, including those collected through community science projects (Callaghan et al. 2019; Pocock et al. 2015) and dedicated databases, such as Mangal (Poisot et al. 2016). This provides an opportunity to merge species distribution and ecological interaction data to improve our predictions of where a species may be found across large spatial scales.

It has been demonstrated that the agreement between range maps and point data varies geographically (Hurlbert and Jetz 2007; Hurlbert and White 2005; Ficetola et al. 2014). Adding ecological interaction data to this comparison might help to elucidate where these (dis)agreements are more likely to be true and which dataset better represent the actual distribution of a species. In this context, we elaborate a method that allows us to detect areas of potential misestimation of species’ distribution data (more precisely range maps) based on interaction data. This method is based on the assumption that organisms cannot persist in an area unless they are directly or indirectly connected to a primary producer within their associated food web (Power 1992). Thus, given that herbivores are the main connection between plant resources (directly limited by environmental conditions) and predators (Dobson 2009; Scott et al. 2018), the range of a predator (omnivore or carnivore) depends on the overlapping ranges of its herbivore preys. If sections of a predator’s range do not overlap with at least one of its prey it will become disconnected from primary producers, and therefore we would not expect the predator to occur in this area.

This mismatch can be the result of different mechanisms, like the misestimation of both the predator’s and the preys’ ranges (Ladle and Hortal 2013; Rondinini et al. 2006), taxonomic errors (Isaac, Mallet, and Mace 2004; Ladle and Hortal 2013), or the lack of information about trophic links (i.e., the lack of connection between the ranges of a predator and a primary producer may be due a third species we don’t know is connected to both). Here in this proof of concept, we investigate the disagreements between available data for species that compose a well-known food web in the African continent, discuss the mechanisms that can lead to this, and reinforce the importance of open geographically explicit interaction data.

Methods

We identified areas of data deficits within the ranges of predators based on a simple rule: any part of a predator’s range that did not intersect with the range of at least one prey herbivore species, which in turn is directly connected to a primary producer (plants), was considered data deficient. To do that, we used a Serengeti food web dataset (Baskerville et al. 2011) (which comprises carnivores, herbivores, and plants from Tanzania) and its species ranges from IUCN. Then, we calculated the difference in range sizes between the original IUCN ranges of predators and those without the areas where they would be disconnected from their food webs, based on species interaction data. Finally, we added the GBIF occurrence points for the Serengeti species to investigate whether the results would be different if we used another source of distribution data.

Data

We investigated the mismatch between savannah species ranges and interactions in Africa (fig. 1). These ecosystems host a range of different species, including the well-characterized predator-prey dynamics between iconic predators (e.g., lions, hyenas, and leopards) and large herbivores (e.g., antelopes, wildebeests, and zebras), as well as a range of herbivorous and carnivorous small mammals. The Serengeti ecosystem has been extensively studied and its food web is one of the most complete we have to date, including primary producers identified to the species level. Here we focus on six groups of herbivores and carnivores from the Serengeti Food Web Data Set (Baskerville et al. 2011). These species exhibit direct antagonistic (predator-prey) interactions with one another and are commonly found across savannah ecosystems on the African continent (McNaughton 1992). Plants in the network were included indirectly in our analyses as we do not expect the primary producers to significantly influence the range of herbivores for several reasons. Firstly, many savannah plants are functionally similar (i.e., grasses, trees and shrubs) and cooccur across the same habitats (Baskerville et al. 2011). Secondly, herbivores in the network are broadly generalists feeding on a wide range of different plants across habitats. Indeed, out of 129 plants in our dataset, herbivores (n = 23) had a mean out degree (mean number of preys) of around 22 (std = 17.5). There is also an absence of global range maps for many plant species (Daru 2020), which prevents their direct inclusion in our analysis. Therefore, we assume that plants consumed by herbivores are present across their ranges, and as such the ranges of herbivores are not expected to be significantly constrained by the availability of food plants.

From the wider ecological network presented in Baskerville (2011), we sampled interaction data for herbivores and carnivores. This subnetwork contained 32 taxa (23 herbivores and 9 carnivores) and 84 interactions and had a connectance of 0.08. Although self-loops are informative, we removed these interactions to allow for the original IUCN ranges of predators with cannibalistic interactions to be adjusted. We treated this overall network as a metaweb since it should contain all potential species interactions between mammalian taxa occurring across savannah ecosystems such as the Serengeti.

We compiled IUCN range maps for the 32 species included in the metaweb from the Spatial Data Download portal (www.iucnredlist.org/resources/spatial-data-download), which we rasterized at a 0.5 degrees resolution (~50 km at the equator). We restricted the rasters to a spatial extent comprised between latitudes 35°S and 40°N and longitudes 20°W and 55°E. We then combined interaction data from the metaweb and cooccurrence data generated from species ranges to create networks for each raster pixel. This generated a total of 11,308 pixel-level networks. These networks describe potential predation, not actual interactions: the former is derived information from the metaweb, and the latter is contingent on the presence of herbivores.

Range overlap measurement

We calculated the geographical overlap, i.e. the extent to which interacting predator and prey species co-occurred across their ranges, as a/(a + c), where a is the number of pixels where predator and prey cooccur and c is the number of pixels where only the focal species occur. This index of geographical overlap can be calculated with prey or predators as the focal species. Values vary between 0 and 1, with values closer to 1 indicating that there is a large overlap in the ranges of the two species and values closer to 0 indicating low cooccurrence across their ranges. For each predator species, we calculated its generality to understand whether the level of trophic specialization (i.e., number of prey items per predator) affects the extent to which the ranges of the species comprised areas of data deficits. One would assume that predators with a greater number of prey taxa (i.e., a higher generality) are less likely to have large areas of data mismatch within their range as it is more likely that at least one prey species is present across most of their range.

Validation

For each species in the dataset we collated point observation data from GBIF (www.gbif.org). We used the GBIF download API to retrieve all species occurrences on November 22nd 2022 (GBIF.org 2022). We restricted our query to the data with spatial coordinates and which were inside the spatial extent of our rasters. A few observations were localized in the ocean near latitude 0° and longitude 0°. We assumed these were errors and removed all observations falling in the extent between latitudes 2°S and 2°N and longitudes 2°W and 2°E to keep only mainland sites. We did not use any additional geographical filters to retrieve as much data as possible. Being mindful of the recent and remarkable anthropogenic impact on African megafauna, we decided to restrict the occurrences used on the validation step to those recorded after the year 2000 (and, therefore, only records with date information). This decision was made after evaluating the overall temporal distribution of the GBIF records.

We then converted the occurrence data into raster format by determining which pixels had at least one GBIF occurrence. This allowed us to remove the effect of repeated sampling in some locations. These data were used to validate the areas identified as being ecologically unrealistic based on species interactions and occurrence data (see beginning of Methods section). To do so, we calculated the proportion of GBIF presence pixels occurring within both the original IUCN species range and the adjusted one (i.e., the one without unrealistic food webs). We then compared these proportions for all predators to verify if the areas of data mismatch contained locations with GBIF observations, hence likely true habitats.

Software

We performed all analyses using Julia v1.7.2 (Bezanson et al. 2017). We used the packages SimpleSDMLayers.jl (Dansereau and Poisot 2021) to manipulate the raster layers, EcologicalNetworks.jl (Poisot et al. 2019) to construct and manipulate the interaction networks, and GBIF.jl (Dansereau and Poisot 2021) to reconcile species names with the GBIF backbone taxonomy (GBIF Secretariat 2021). We also used GDAL (GDAL/OGR contributors 2021) to rasterize the IUCN range maps (initially available as shapefiles from the Spatial Data Download portal). All the scripts required to reproduce the analyses are available at https://doi.org/10.5281/zenodo.6842861.

Results

Mammal species found in the Serengeti food web are widespread in Africa, especially in grasslands and savannahs (panel (a) of fig. 1). From our analysis, most local networks (69.07%) built using the original IUCN range maps had at least one mammal species with a path to a primary producer (panel (b) of fig. 1), which reinforces that the interactions we observe in the Serengeti food web is representative of the interactions for these mammals in the whole African continent. On average, local food webs had almost half of their mammal species disconnected from basal species (mean = 46.2%, median = 33.3%). In addition, 16.6% of the networks only had disconnected mammals, and the number of mammal species varied from 1 to 28, with a mean of 6.7. As expected, the proportion of carnivores with a path to a primary producer was conditional on the total number of mammal species in each local network (third panel of fig. 1).

Figure 1: (a) Spatial distribution of species richness according to the original IUCN range maps of all 32 mammal species of the Serengeti food web. (b) Proportion of mammal species remaining in each local network (i.e., each pixel) after removing all species without a path to a primary producer. (c) Proportion of mammal species remaining in each local network as a function of the number of species given by the original IUCN range maps.

Specialized predators have higher rates of range mismatch

Figure 2: Negative relationship between the out degree of predator species and their relative range mismatch. More specialized predators “lose” a higher proportion of their ranges due to mismatches with the ranges of their preys.

If we consider that we can not use areas where there are no superposition between predators and prey on ecological analyses, we lose more range area for predators with fewer prey (fig. 2). For instance, both Leptailurus serval and Canis mesomelas have only one prey in the Serengeti food web (tbl. 1), each of them with a very small range compared to those of their predator. This discrepancy between range sizes promotes significant range loss. On the other hand, predators of the genus Panthera are some of the most connected species, and they also lose the least proportion of their ranges. This mismatch between predators and preys can also be a result of taxonomic disagreement between the geographical and ecological data. Although Canis aureus has the same number of prey as Caracal caracal, none of the prey taxa of the former occurs inside its original range (tbl. 1), which results in complete range loss.

Figure 3: Geographical similarity between the original IUCN range maps of predators and preys. Dots represent predator-prey pairs, with different symbols corresponding to different predators. For a given pair of species, the number c of pixels where the focal species is present but not the other and the number a of pixels where the predator and prey cooccur, were calculated. Geographic similarities were given by a/(a+c), with the predator being the focal species in the predator to prey similarity (x-axis), while the prey is the focal one in the prey to predator similarity (y-axis). One of the predators, Canis aureus, is not represented in the image because it is an extreme case (where all its range is suppressed by the absence of preys) and it would make the interpretation of the data more difficult. — Figure 3: Geographical similarity between the original IUCN range maps of predators and preys. Dots represent predator-prey pairs, with different symbols corresponding to different predators. For a given pair of species, the number c of pixels where the focal species is present but not the other and the number a of pixels where the predator and prey cooccur, were calculated. Geographic similarities were given by a/(a + c), with the predator being the focal species in the predator to prey similarity (x-axis), while the prey is the focal one in the prey to predator similarity (y-axis). One of the predators, *Canis aureus*, is not represented in the image because it is an extreme case (where all its range is suppressed by the absence of preys) and it would make the interpretation of the data more difficult.

There was a high variation in the overlap of predator and prey ranges (fig. 3). The high density of points on the left-hand side of fig. 3 indicates that most preys have small ranges in comparison to those of the set of carnivores in the networks, resulting in either low overlap between both ranges (bottom) or high overlap of ranges because much of that of the prey is within predators’ range (top). The top-right side of the plot encompasses situations where the ranges of both predator and prey are similar and overlapping, while the bottom-right part of the plot represents a situation where the range of the predator is smaller than that of its prey and much of it occurs within the preys’ range. For example, Panthera pardus had many preys occurring inside its range, with highly variable levels of overlap (tbl. 1). In general, species exhibited more consistent values of prey-predator overlap, than predator-prey overlap – indicated by the spread of points along the x-axis, yet more restricted variation on the y-axis (fig. 3). There was also no overall relationship between the two metrics, or for any predator species.

Table 1: List of species analysed, their out and in degrees, total original range size (in pixels), and proportion of their ranges occupied by their preys and predators (values between 0 and 1). Species are sorted according to the groups identified by Baskerville et al. (2011). Notice how some species are isolated in the network (*Loxodonta africana*) and how *Canis aureus*’s range does not overlap with any of its preys.
Species	Number of preys	Number of predators	Total range size	Proportion of range occupied by preys	Proportion of range occupied by predators
Large carnivores
Acinonyx jubatus	8	1	9250	0.437	0.618
Crocuta crocuta	12	1	4822	0.844	0.253
Lycaon pictus	14	0	427	0.918	-
Panthera leo	18	0	1274	0.935	-
Panthera pardus	22	0	7563	0.766	-

Small carnivores
Canis aureus	4	1	816	0.000	0.782
Canis mesomelas	1	1	2201	0.190	0.994
Caracal caracal	4	0	5239	0.833	-
Leptailurus serval	1	1	4319	0.011	0.978

Small herbivores
Damaliscus lunatus	0	4	626	-	1
Hippopotamus amphibius	0	0	419	-	-
Kobus ellipsiprymnus	0	4	2961	-	1
Ourebia ourebi	0	5	2484	-	1
Pedetes capensis	0	2	1318	-	1
Phacochoerus africanus	0	5	3331	-	1
Redunca redunca	0	5	1935	-	1
Rhabdomys pumilio	0	5	53	-	1
Tragelaphus oryx	0	2	2316	-	0.990
Tragelaphus scriptus	0	3	3999	-	0.985

Large grazers
Aepyceros melampus	0	5	1167	-	1
Alcelaphus buselaphus	0	4	2307	-	1
Connochaetes taurinus	0	6	1074	-	1
Equus quagga	0	5	786	-	1
Eudorcas thomsonii	0	6	51	-	1
Nanger granti	0	6	261	-	1

Hyraxes
Heterohyrax brucei	0	1	1961	-	0.973
Procavia capensis	0	1	5312	-	0.647

Others
Giraffa camelopardalis	0	1	607	-	0.473
Loxodonta africana	0	0	1078	-	-
Madoqua kirkii	0	7	443	-	1
Papio anubis	0	1	2571	-	0.937
Syncerus caffer	0	1	2808	-	0.251

Validation with GBIF occurrences

The proportion of GBIF pixels (pixels with at least one GBIF occurrence) matching the IUCN ranges varied a lot for species with small ranges and way less for species with large ranges (fig. 4, left). This means that species with large ranges had more area where their datasets for ecological and geographical information agreed. The lowest proportions of GBIF pixels occurred for species with small ranges. Amongst herbivores, Rhabdomys pumilio has a proportion of 25.6% of its presence pixels within its IUCN range, while predators have this proportion above 47% (such as Lycaon pictus, with 47.6%, and Panthera leo, with 49.3%). Nevertheless, some species with smaller ranges showed high data overlap (such as Canis mesomelas, with 94.1%, and many herbivores). Overall, predators and preys displayed similar overlap variations, and species with median and large ranges had higher proportions of agreement between GBIF, IUCN and interaction datasets.

The proportion of GBIF pixels in revised ranges can only be equal to or lower than that of the original ranges, as our analysis removes pixels from the original range and does not add new ones. Rather, the absence of a difference between the two types of ranges indicates that no pixels with GBIF observations, hence likely true habitats, were removed by our analysis. Here this proportion was mostly similar to that of the original IUCN ranges for most predator species (fig. 4, right). Two species showed no difference in proportion (Lycaon pictus and Panthera leo) while four species showed only small differences (Crocuta crocuta lost 0.4% of the original data overlap; Caracal caracal lost 3.4%; Acinonyx jubatus and Panthera pardus lost 6.2%).

On the other hand, three species, Canis aureus, Canis mesomelas, and Leptailurus serval showed very high differences, with overlaps lowered by 100%, 58.4%, and 100% respectively. These last two species are also the only predators with a single prey in our metaweb. Canis aureus has four preys, but it has one of the smallest ranges in IUCN, which is not covered by any of its preys. This result reinforces the concern raised in the literature on the use of IUCN range maps for species that are not well known (Herkt, Skidmore, and Fahr 2017), demonstrating how small range species are likely to have their distribution underestimated in the IUCN database. Additionally, the fact that Canis aureus had such a conspicuous discrepancy between its original IUCN range and those of its preys, and between GBIF and IUCN data, may indicate a taxonomic incongruency between the three databases used here, which we explore in the Discussion section. Our results delineate how a mismatch between GBIF and IUCN databases differ greatly with small changes in herbivore species ranges, and it is somewhat positively related to range size for predator species. Moreover, we show that accounting for interactions does not necessarily aggravates this dissimilarity, but it is relevant for species about which we have little ecological information or for specialists groups.

Figure 4: Left panel: Distribution of the proportion of GBIF pixels (pixels with at least one occurrence in GBIF) supperposed by the IUCN range data for different range sizes. Right panel: Differences between the proportion of GBIF pixels matching the original and cropped IUCN range maps for every predator species. Arrows go from the proportion inside the original range to the proportion inside the revised range, which can only be equal or lower. Overlapping markers indicate no difference between the types of layers. Species markers are the same on both figures, with predators presented in distinct colored markers and all herbivores grouped in a single grey marker. Pixels represent a resolution of 0.5 degrees.

Discussion

Here we identify areas of data mismatch between species range maps by using ecological interaction data (predator-prey interactions within food webs). Our results did show a significant mismatch in the IUCN range areas of specialized and generalist predatory organisms and their prey, which highlights the importance of accounting for species interactions when estimating the range of a species. Although this type of data mismatch can be result of actual ecological processes, outdated occurrence data, taxonomic errors and more, we argue that, here, they rather indicate a lack of interaction sampling data.

The case of the golden jackal (Canis aureus) is a good illustration of how the taxonomic, geographical and ecological data can be used to validate one another. The jackal is a widespread taxon in northern Africa, Europe, and Australasia, generally well adapted to local conditions due to its largely varied diet (Tsunoda and Saito 2020; Krofel et al. 2021). Because of that, we expected that the Canis species in our dataset would be the ones losing the least amount of range, with a higher value of the proportion of GBIF pixels within their IUCN range maps. However, the taxonomy of this group is a matter of intense discussion, as molecular and morphological data seem to disagree in the clustering of species and subspecies (Krofel et al. 2021; Stoyanov 2020). This debate probably influenced our results: with originally only 64.9% of the GBIF pixels of the golden jackal overlapping with its IUCN data, we suspect that many of the GBIF occurrences refer to other Canis species, and that its taxonomic identification in the network database is probably outdated. This led to a complete exclusion of Canis aureus from its original range in our analysis, despite the fact that this species has four documented preys in our metaweb.

Geographical mismatch and data availability

The lack of superposition between IUCN range maps and GBIF occurrences in our results suggests that we certainly miss geographical information about the distribution of either the prey or the predator. On the other hand, if both GBIF and IUCN occurrences tended to superpose and the species was still locally removed, this indicates that we don’t have information about all its interactions (e.g., predators may be feeding on different species than the ones in our dataset outside the Serengeti ecosystem). This rationale can be illustrated with three types of mismatches identified in our results.

First, Panthera leo was one of the species with no difference between ranges before and after our analysis, but 50.7% of its GBIF pixels did not superpose with the IUCN range (fig. 4). In this particular case, the IUCN maps seem to agree with species interaction data. However, the disagreement between the IUCN and the GBIF databases is concerning and suggests that the IUCN maps might underestimate the lion’s distribution.

On the other hand, Leptailurus serval and Canis mesomelas are two of the three species that have the higher proportion of mismatched range due to the lack of paths to a herbivore, but are also some of the species with the higher proportion of GBIF occurrences inside their original IUCN range maps (fig. 4). This indicates that the information we are missing for these two species is related to either an additional interaction or to the presence of external interacting species. To illustrate that, we mapped the GBIF data for the prey of Leptailurus serval, with a mobility buffer around each point (fig. 5). When considering GBIF data, approximately 36% of the prey’s occurrences are within the portion of the predator’s range that was divergent from its original IUCN data. With the buffer area, this corresponds to 5.57% of the mismatched area. By adding GBIF information for the prey, we could therefore reduce the discrepancy of the range (or information) for the predator by 5.57% since its distribution is conditional on the occurrence of its preys. In other words, the range mismatch was exagerated because we were missing information on the presence of an interacting species (i.e., this also indicates that there is a mismatch - or complementarity - between the IUCN and GBIF data for their prey).

Figure 5: Mismatch between servals’ range loss and GBIF occurrence of its prey. The left panel shows the reduction of servals’ range when we consider the IUCN data of its prey. On the right panel, we added GBIF data on both serval and its prey, with a buffer for the prey to account for species mobility.

Finally, the extreme case of Canis aureus illustrates a lack of both geographical and ecological information: only half of its GBIF presence pixels and none of its preys occur inside its IUCN range. We believe, therefore, that the validation of species distribution based on ecological interaction is a relevant method that can further fill in information gaps. Nevertheless, it is imperative that more geographically explicit data about ecological networks and interactions become available. This would help clarify when cooccurrences can be translated into interactions (Windsor et al. 2022) and help the development of more advanced validation methods for occurrence data.

Next steps

Here we demonstrated how we can detect areas of data deficit in species distribution data using ecological interactions. Knowing where questionable occurrence data are can be crucial in ecological modelling (Hortal 2008; Ladle and Hortal 2013), and accounting for these errors can improve model outputs by diminishing the error propagation (Draper 1995). For instance, we believe our method is a way to account for ecological interactions in habitat suitability models without making the models more complex, but by making sure (not assuming) that the input data - the species occurrence - actually accounts for ecological interactions. Another application of this method is mapping areas where data are deficient, thus helping to identify priority sampling locations for interaction data, which can, in turn, reduce uncertainty in network prediction. For example, if a certain pixel confirms the presence of a species both with IUCN and GBIF data, but lacks connection between species, this pixel has a high potential to hide an unobserved interaction and should therefore be a priority sampling location.
It is important to notice, however, that the quality and usefulness of this method are highly correlated with the amount and quality of data available about species’ occurrences and interactions. With this paper, we hope to add to the collective effort to decode the encrypted message that is the occurrence of a species in space and time. A promising avenue that adds to our method is the prediction of networks and interactions at large scales (Strydom et al. 2021; Windsor et al. 2022), for they can add valuable information about ecological interactions where they are missing. Additionally, in order to achieve a robust modelling framework towards actual species distribution models we should invest in efforts to collect and combine open data on species occurrence and interactions (Windsor et al. 2022), especially because we may be losing ecological interactions at least as fast as we are losing species (Valiente-Banuet et al. 2015).

Acknowledgements

We acknowledge that this study was conducted on land within the traditional unceded territory of the Saint Lawrence Iroquoian, Anishinabewaki, Mohawk, Huron-Wendat, and Omàmiwininiwak nations. We thank the editor and reviewers for their thoughtful comments, which considerably improved this manuscript.

References

Abrego, Nerea, Tomas Roslin, Tea Huotari, Yinqiu Ji, Niels Martin Schmidt, Jiaxin Wang, Douglas W. Yu, and Otso Ovaskainen. 2021. “Accounting for Species Interactions Is Necessary for Predicting How Arctic Arthropod Communities Respond to Climate Change.” Ecography 44 (6): 885–96. https://doi.org/10.1111/ecog.05547.

Afkhami, Michelle E., Patrick J. McIntyre, and Sharon Y. Strauss. 2014. “Mutualist-Mediated Effects on Species’ Range Limits Across Large Geographic Scales.” Ecology Letters 17 (10): 1265–73. https://doi.org/10.1111/ele.12332.

Albrecht, Jörg. 2018. “Plant and Animal Functional Diversity Drive Mutualistic Network Assembly Across an Elevational Gradient.” NATURE COMMUNICATIONS, 10.

Alhajeri, Bader H, and Yoan Fourcade. 2019. “High Correlation Between Species-Level Environmental Data Estimates Extracted from IUCN Expert Range Maps and from GBIF Occurrence Data.” Journal of Biogeography, 13. https://doi.org/10.1111/jbi.13619.

Araújo, Carlos B. de, Luiz Octavio Marcondes-Machado, and Gabriel C. Costa. 2014. “The Importance of Biotic Interactions in Species Distribution Models: A Test of the Eltonian Noise Hypothesis Using Parrots.” Journal of Biogeography 41 (3): 513–23. https://doi.org/10.1111/jbi.12234.

Baskerville, Edward B., Andy P. Dobson, Trevor Bedford, Stefano Allesina, T. Michael Anderson, and Mercedes Pascual. 2011. “Spatial Guilds in the Serengeti Food Web Revealed by a Bayesian Group Model.” PLOS Computational Biology 7 (12): e1002321. https://doi.org/10.1371/journal.pcbi.1002321.

Bezanson, Jeff, Alan Edelman, Stefan Karpinski, and Viral B. Shah. 2017. “Julia: A Fresh Approach to Numerical Computing.” SIAM Review 59 (1): 65–98. https://doi.org/10.1137/141000671.

Blanchet, F. Guillaume, Kevin Cazelles, and Dominique Gravel. 2020. “Co-Occurrence Is Not Evidence of Ecological Interactions.” Ecology Letters 23 (7): 1050–63. https://doi.org/10.1111/ele.13525.

Boakes, Elizabeth H., Philip J. K. McGowan, Richard A. Fuller, Ding Chang-qing, Natalie E. Clark, Kim O’Connor, and Georgina M. Mace. 2010. “Distorted Views of Biodiversity: Spatial and Temporal Bias in Species Occurrence Data.” PLOS Biology 8 (6): e1000385. https://doi.org/10.1371/journal.pbio.1000385.

Cabral, Juliano Sarmento, Luis Valente, and Florian Hartig. 2017. “Mechanistic Simulation Models in Macroecology and Biogeography: State-of-Art and Prospects.” Ecography 40 (2): 267–80. https://doi.org/10.1111/ecog.02480.

Callaghan, Corey T., Jodi J. L. Rowley, William K. Cornwell, Alistair G. B. Poore, and Richard E. Major. 2019. “Improving Big Citizen Science Data: Moving Beyond Haphazard Sampling.” PLOS Biology 17 (6): e3000357. https://doi.org/10.1371/journal.pbio.3000357.

Dallas, Tad, Samuel Pironon, and Luca Santini. 2020. “The Abundant-Centre Is Not All That Abundant: A Comment to Osorio-Olvera Et Al. 2020,” 2020.02.27.968586. https://doi.org/10.1101/2020.02.27.968586.

Dansereau, Gabriel, and Timothée Poisot. 2021. “SimpleSDMLayers.jl and GBIF.jl: A Framework for Species Distribution Modeling in Julia.” Journal of Open Source Software 6 (57): 2872. https://doi.org/10.21105/joss.02872.

Daru, Barnabas H. 2020. “GreenMaps: A Tool for Addressing the Wallacean Shortfall in the Global Distribution of Plants.” bioRxiv, 2020.02.21.960161. https://doi.org/10.1101/2020.02.21.960161.

Dobson, Andy. 2009. “Food-Web Structure and Ecosystem Services: Insights from the Serengeti.” Philosophical Transactions of the Royal Society B: Biological Sciences 364 (1524): 1665–82. https://doi.org/10.1098/rstb.2008.0287.

Draper, D. 1995. “Assessment and Propagation of Model Uncertainty.” Journal of the Royal Statistical Society Series B-Statistical Methodology 57 (1): 45–97. https://doi.org/10.1111/j.2517-6161.1995.tb02015.x.

Ficetola, Gentile Francesco, Carlo Rondinini, Anna Bonardi, Vineet Katariya, Emilio Padoa-Schioppa, and Ariadne Angulo. 2014. “An Evaluation of the Robustness of Global Amphibian Range Maps.” Journal of Biogeography 41 (2): 211–21. https://doi.org/10.1111/jbi.12206.

Fourcade, Yoan. 2016. “Comparing Species Distributions Modelled from Occurrence Data and from Expert-Based Range Maps. Implication for Predicting Range Shifts with Climate Change.” Ecological Informatics 36: 8–14. https://doi.org/10.1016/j.ecoinf.2016.09.002.

Fricke, Evan C., Alejandro Ordonez, Haldre S. Rogers, and Jens-Christian Svenning. 2022. “The Effects of Defaunation on Plants’ Capacity to Track Climate Change.” Science. https://doi.org/10.1126/science.abk3510.

GBIF.org. 2022. “GBIF Occurrence Download.” The Global Biodiversity Information Facility. https://doi.org/10.15468/DL.PF4586.

GBIF Secretariat. 2021. “GBIF Backbone Taxonomy.” https://doi.org/10.15468/39omei.

GDAL/OGR contributors. 2021. GDAL/OGR Geospatial Data Abstraction Software Library. Manual. Open Source Geospatial Foundation.

Godsoe, William, and Luke J. Harmon. 2012. “How Do Species Interactions Affect Species Distribution Models?” Ecography 35 (9): 811–20. https://doi.org/10.1111/j.1600-0587.2011.07103.x.

Godsoe, William, Jill Jankowski, Robert D. Holt, and Dominique Gravel. 2017. “Integrating Biogeography with Contemporary Niche Theory.” Trends in Ecology and Evolution 32 (7): 488–99. https://doi.org/10.1016/j.tree.2017.03.008.

Gotelli, Nicholas J., Gary R. Graves, and Carsten Rahbek. 2010. “Macroecological Signals of Species Interactions in the Danish Avifauna.” Proceedings of the National Academy of Sciences 107 (11): 5030–35. https://doi.org/10.1073/pnas.0914089107.

Herkt, K. Matthias B., Andrew K. Skidmore, and Jakob Fahr. 2017. “Macroecological Conclusions Based on IUCN Expert Maps: A Call for Caution.” Global Ecology and Biogeography 26 (8): 930–41. https://doi.org/10.1111/geb.12601.

Hortal, Joaquín. 2008. “Uncertainty and the Measurement of Terrestrial Biodiversity Gradients.” Journal of Biogeography 35 (8): 1335–36. https://doi.org/10.1111/j.1365-2699.2008.01955.x.

Hortal, Joaquín, Francesco de Bello, José Alexandre F. Diniz-Filho, Thomas M. Lewinsohn, Jorge M. Lobo, and Richard J. Ladle. 2015. “Seven Shortfalls That Beset Large-Scale Knowledge of Biodiversity.” Annual Review of Ecology, Evolution, and Systematics 46 (1): 523–49. https://doi.org/10.1146/annurev-ecolsys-112414-054400.

Hortal, Joaquín, Alberto Jiménez-Valverde, José F. Gómez, Jorge M. Lobo, and Andrés Baselga. 2008. “Historical Bias in Biodiversity Inventories Affects the Observed Environmental Niche of the Species.” Oikos 117 (6): 847–58. https://doi.org/10.1111/j.0030-1299.2008.16434.x.

Hurlbert, Allen H., and Walter Jetz. 2007. “Species Richness, Hotspots, and the Scale Dependence of Range Maps in Ecology and Conservation.” Proceedings of the National Academy of Sciences 104 (33): 13384–89. https://doi.org/10.1073/pnas.0704469104.

Hurlbert, Allen H., and Ethan P. White. 2005. “Disparity Between Range Map- and Survey-Based Analyses of Species Richness: Patterns, Processes and Implications.” Ecology Letters 8 (3): 319–27. https://doi.org/10.1111/j.1461-0248.2005.00726.x.

Isaac, Nick J. B., James Mallet, and Georgina M. Mace. 2004. “Taxonomic Inflation: Its Influence on Macroecology and Conservation.” Trends in Ecology & Evolution 19 (9): 464–69. https://doi.org/10.1016/j.tree.2004.06.004.

IUCN Red List Technical Working Group. 2019. “Mapping Standards and Data Quality for IUCN Red List Spatial Data.” Prepared by the Standards and Petitions Working Group of the IUCN SSC Red ….

Krofel, M., J. Hatlauf, W. Bogdanowicz, L. a. D. Campbell, R. Godinho, Y. V. Jhala, A. C. Kitchener, et al. 2021. “Towards Resolving Taxonomic Uncertainties in Wolf, Dog and Jackal Lineages of Africa, Eurasia and Australasia.” Journal of Zoology n/a (n/a): 1–14. https://doi.org/10.1111/jzo.12946.

Ladle, Richard, and Joaquín Hortal. 2013. “Mapping Species Distributions: Living with Uncertainty.” Frontiers of Biogeography 5 (1): 4–6.

McNaughton, S. J. 1992. “The Propagation of Disturbance in Savannas Through Food Webs.” Journal of Vegetation Science 3 (3): 301–14. https://doi.org/10.2307/3235755.

Meyer, Carsten, Patrick Weigelt, and Holger Kreft. 2016. “Multidimensional Biases, Gaps and Uncertainties in Global Plant Occurrence Information.” Ecology Letters 19 (8): 992–1006. https://doi.org/10.1111/ele.12624.

Pocock, Michael J. O., Helen E. Roy, Chris D. Preston, and David B. Roy. 2015. “The Biological Records Centre: A Pioneer of Citizen Science.” Biological Journal of the Linnean Society 115 (3): 475–93. https://doi.org/10.1111/bij.12548.

Poisot, Timothée, Benjamin Baiser, Jennifer A Dunne, Sonia Kéfi, François Massol, Nicolas Mouquet, Tamara N Romanuk, Daniel B Stouffer, Spencer A Wood, and Dominique Gravel. 2016. “Mangal - Making Ecological Network Analysis Simple.” Ecography 39 (4): 384–90.

Poisot, Timothée, Gabriel Bergeron, Kevin Cazelles, Tad Dallas, Dominique Gravel, Andrew MacDonald, Benjamin Mercier, Clément Violet, and Steve Vissault. 2021. “Global Knowledge Gaps in Species Interaction Networks Data.” Journal of Biogeography 48 (7): 1552–63. https://doi.org/10.1111/jbi.14127.

Poisot, Timothée, Gabriel Bergeron, Kevin Cazelles, Tad Dallas, Dominique Gravel, Andrew Macdonald, Benjamin Mercier, Clément Violet, and Steve Vissault. 2020. “Environmental Biases in the Study of Ecological Networks at the Planetary Scale.” bioRxiv, 2020.01.27.921429. https://doi.org/10.1101/2020.01.27.921429.

Poisot, Timothée, Zachary Bélisle, Laura Hoebeke, Michiel Stock, and Piotr Szefer. 2019. “EcologicalNetworks.jl: Analysing Ecological Networks of Species Interactions.” Ecography 42 (11): 1850–61. https://doi.org/10.1111/ecog.04310.

Power, Mary E. 1992. “Top-Down and Bottom-Up Forces in Food Webs: Do Plants Have Primacy.” Ecology 73 (3): 733–46. https://doi.org/10.2307/1940153.

Rondinini, Carlo, Kerrie A. Wilson, Luigi Boitani, Hedley Grantham, and Hugh P. Possingham. 2006. “Tradeoffs of Different Types of Species Occurrence Data for Use in Systematic Conservation Planning.” Ecology Letters 9 (10): 1136–45. https://doi.org/10.1111/j.1461-0248.2006.00970.x.

Ronquillo, Cristina, Fernanda Alves-Martins, Vicente Mazimpaka, Thadeu Sobral-Souza, Bruno Vilela-Silva, Nagore G. Medina, and Joaquín Hortal. 2020. “Assessing Spatial and Temporal Biases and Gaps in the Publicly Available Distributional Information of Iberian Mosses.” Biodiversity Data Journal 8: e53474. https://doi.org/10.3897/BDJ.8.e53474.

Roy, Helen E., Elizabeth Baxter, Aoine Saunders, and Michael J. O. Pocock. 2016. “Focal Plant Observations as a Standardised Method for Pollinator Monitoring: Opportunities and Limitations for Mass Participation Citizen Science.” PLOS ONE 11 (3): e0150794. https://doi.org/10.1371/journal.pone.0150794.

Ryan, S. F., N. L. Adamson, A. Aktipis, L. K. Andersen, R. Austin, L. Barnes, M. R. Beasley, et al. 2018. “The Role of Citizen Science in Addressing Grand Challenges in Food and Agriculture Research.” Proceedings of the Royal Society B: Biological Sciences 285 (1891). https://doi.org/10.1098/rspb.2018.1977.

Scott, Abigail L., Paul H. York, Clare Duncan, Peter I. Macreadie, Rod M. Connolly, Megan T. Ellis, Jessie C. Jarvis, Kristin I. Jinks, Helene Marsh, and Michael A. Rasheed. 2018. “The Role of Herbivory in Structuring Tropical Seagrass Ecosystem Service Delivery.” Frontiers in Plant Science 9: 127. https://doi.org/10.3389/fpls.2018.00127.

Stoyanov, S. 2020. “Cranial Variability and Differentiation Among Golden Jackals (Canis Aureus) in Europe, Asia Minor and Africa.” ZooKeys. https://doi.org/10.3897/zookeys.917.39449.

Strydom, Tanya, Michael D. Catchen, Francis Banville, Dominique Caron, Gabriel Dansereau, Philippe Desjardins-Proulx, Norma R. Forero-Muñoz, et al. 2021. “A Roadmap Towards Predicting Species Interaction Networks (across Space and Time).” Philosophical Transactions of the Royal Society B: Biological Sciences 376 (1837): 20210063. https://doi.org/10.1098/rstb.2021.0063.

Tsunoda, Hiroshi, and Masayuki U. Saito. 2020. “Variations in the Trophic Niches of the Golden Jackal Canis Aureus Across the Eurasian Continent Associated with Biogeographic and Anthropogenic Factors.” Journal of Vertebrate Biology 69 (4): 20056.1. https://doi.org/10.25225/jvb.20056.

Valiente-Banuet, Alfonso, Marcelo A. Aizen, Julio M. Alcántara, Juan Arroyo, Andrea Cocucci, Mauro Galetti, María B. García, et al. 2015. “Beyond Species Loss: The Extinction of Ecological Interactions in a Changing World.” Edited by Marc Johnson. Functional Ecology 29 (3): 299–307. https://doi.org/10.1111/1365-2435.12356.

Windsor, Fredric M., Johan van den Hoogen, Thomas W. Crowther, and Darren M. Evans. 2022. “Using Ecological Networks to Answer Questions in Global Biogeography and Ecology.” Journal of Biogeography n/a (n/a). https://doi.org/10.1111/jbi.14447.

Wisz, Mary Susanne, Julien Pottier, W Daniel Kissling, Loïc Pellissier, Jonathan Lenoir, Christian F Damgaard, Carsten F Dormann, et al. 2013. “The Role of Biotic Interactions in Shaping Distributions and Realised Assemblages of Species: Implications for Species Distribution Modelling.” Biological Reviews of the Cambridge Philosophical Society 88 (1): 15–30. https://doi.org/10.1111/j.1469-185X.2012.00235.x.