Distribution modelling of Tor putitora (Hamilton, 1822), an endangered cyprinid in the Himalayan river system using MaxEnt

Freshwater fishes are threatened by habitat modification, overfishing, pollution, invasive species, and climate change. Hence, habitat assessment using geospatial technology is essential in predicting the distribution of threatened candidates. Tor putitora, an endangered game fish is declining due to indiscriminate fishing and habitat destruction. No attempts were made so far to understand habitat predilection of the species. In this study, we used Maximum Entropy (MaxEnt) model with occurrence records and environmental variables. The occurrences and environmental variables were collected from survey and secondary sources and used after removing biases, multicollinearity test and contribution analysis. The model performed well with area under curve (AUC) 0.867 and true skill statistic (TSS) 0.756. The main contributing variables were upstream elevation (35.3%) and flow length (21.4%). The model predicted 10% suitable, 33% slight to moderate suitable and 57% unsuitable. So, modelling is useful in identifying river networks for conservation and eco-restoration of freshwater fishes.


Introduction
Tor putitora (Hamilton 1822) is known popularly as Golden mahseer or Himalayan mahseer and it belongs to the family Cyprinidae under order Cypriniformes. It is one of the largest freshwater fish of the Indian subcontinent inhabiting mainly the rivers of the Himalayan foothill region (Bhatt and Pandit 2015). The species has a restricted area of occupancy in general within India, Afghanistan, Pakistan, Nepal, Bhutan, Myanmar, and Bangladesh (Rajbanshi and Csavas 1982;Talwar et al. 1991;Oo 2002). Behaviourally, being potamodromous (migrating within freshwaters) one, the species mostly uses smaller hill streams as their spawning grounds (generally rapid streams, riverine pools, and lakes characterized by large boulders, pebbles and gravel) with water temperature range of 11 to 30.5° C, alkaline pH and dissolved oxygen concentration in the range of 6.4-11 mg/l (Joshi 1994;Bhatt et al. 2004). Compared to the other congeners, T. putitora is elongated but laterally compressed having longer head and head length is significantly greater than body depth at dorsal fin origin. Further, the head is slightly blunt with a large mouth and prominent lateral line stripes whereas the body is dim greenish dorsally, yellowish and silvery white ventrally. The distal margin of pelvic, anal and caudal fins show tint of reddish-golden colour as well as at maturity the body above its lateral line shows usually golden colour, which might be absent in the juvenile stage (Darshan et al. 2019). According to IUCN (2021), T. putitora is enlisted as an endangered (EN) species.
Across the global range, a total of 17 species of Tor are considered valid (Fricke et al. 2021). Out of which, eight (8) species have been reported from India namely Tor barakae (Arunkumar and Basudha 2003), Tor khudree (Sykes 1839), Tor kulkarnii (Menon 1992), Tor malabaricus (Jerdon 1849), Tor mosal (Hamilton 1822), Tor putitora (Hamilton 1822), Tor remadeviae (Kurup and Radhakrishnan 2011) and Tor tor (Hamilton 1822). In the North-eastern states of India, only 3 species of genus Tor namely Tor barakae, T. putitora and T. tor have been reported to date. Interestingly, T. putitora is the state fish in many states of India viz., Arunachal Pradesh, Himachal Pradesh, Uttarakhand and the union territory of Jammu and Kashmir. The Indus, Ganges and Brahmaputra river basins are the native range of T. putitora (Jha et al. 2018). Among the Indian species of the genus, T. putitora is the important food and a popular game fish after T. tor. It constitutes an amazing fishery in different foothill water bodies of the country and nowadays, many reservoirs are being stocked with the species both for fishery and ecotourism development. However, creation of dams across certain rivers has alarmingly degraded habitat and shrink their natural breeding grounds. Furthermore, high mortality of brood and juvenile fishes often have been witnessed due to the act of indiscriminate overfishing in the rivers and the species has been estimated to decline by more than 50% already (Jha et al. 2018). T. putitora is also popular and important high-value indigenous fish species in Nepal (Rai et al. 1997). The species is also reportedly declining from its natural habitat in Nepal (Bista et al. 2007). However, in recent years, the success in artificial breeding at some research stations has provided enthusiasm on developing it for commercial cultivation in natural waters (Rai et al. 2006). In Bhutan, it is commercially important game fish, and has a very high table value. It is locally called as Serngya (Yellow fish), and represents the symbol of good luck in the Bhutanese belief system (Ministry of Agriculture and Forests, Bhutan 2020). Unfortunately, the natural stock in the country is also declining due to deterioration of the natural habitat to irreparable extent.
Species Distribution Modelling (SDM) predicts the distribution of a species across geographic space and time using environmental data and occurrence records. Hence, the significance of ecological factors like topographic, climatic, land cover, physico-chemical properties of soil and water, etc. cannot be ignored in determining the distribution pattern of freshwater fishes (Roy et al. 2021). Further, the alteration in thermal regimes, land use and hydrology has been reported to cause fish isolation and fragmentation within the terrestrial landscapes leading to changes in the distributional patterns (Johnson et al. 2012;Greshishchev et al. 2015;Ruiz-Navarro et al. 2016;Hamilton et al. 2018). Therefore, in view of the declining population of natural stock, estimating the potential distribution of this sought after freshwater fish through ecological modelling is highly indispensable for policy making and planning particularly towards habitat restoration and implementing conservation strategies. So far, hardly any attempts have been made to model the distribution of freshwater fishes, particularly T. putitora in the riverine systems of the Himalayas. This research article hitherto revealed that Maximum Entropy (MaxEnt) approach (Phillips et al. 2006) can predict the distribution range of an endangered Cyprinid fish (T. putitora) on the backdrop of the occurrence records and environmental factors very precisely and successfully. It is expected that such outcome would be able to provide important information for rational aquatic resource management and ex-situ or in-situ conservation of the species like golden mahseer in its natural habitat.

Species occurrence data
As per the available literature, the targeted species is mainly distributed in the rivers of the Himalayan region. Hence, the study area encompasses the union territory of Jammu and Kashmir in the west to the North-East Indian states in the east including the countries of Nepal and Bhutan (Fig. 1). The occurrences of Tor putitora in the study area were collected from both primary and secondary sources. The primary occurrence data were recorded during the field survey using handheld Garmin GPSMAP 64csx. The dismo package of R (Version 4.0.3) has been used to access the secondary occurrence data from the Global Biodiversity Information Facility (GBIF 2019) following Hijmans et al. (2011). The search string 'Tor putitora' in GBIF returned 138 occurrence data and another 42 species occurrences were consulted from different published journal articles. Thus, a total of 180 occurrence records were collected. Out of this, the species records lacking geographic coordinates and falling outside the study area were removed. Further, a snapping tolerance of 1 km was used to move the occurrence points to the closest freshwater pixel and the points falling outside these pixels were removed. The new geographical coordinates (104 Nos.) were used as presence records for testing the geographical biasness (Supplementary -1). The occurrence records are often reported to be biased due to easy accessibility of an area or near road or areas of high population density that can distort the species' niche resulting into error in estimates of the model performances (Kadmon et al. 2004;Veloz 2009). Hence, the biasness in the occurrence records were removed using spThin package in R, which uses an algorithm to randomize the removal of occurrence locations within a specified distance threshold (Aiello-Lammens et al. 2015). The resulting dataset retained the records of species presence within 10 km spatial grid and returned a dataset of 58 occurrence points for the model (Supplementary -2).

Environmental variables
For generating the distribution modelling of T. putitora, a set of near-global environmental variables with 1 km resolution were downloaded from the website of EarthEnv project (www.earthenv.org) using R. EarthEnv is a collaborative project of biodiversity scientists and remote sensing experts for monitoring and modelling biodiversity, ecosystems, and climate.
The sources of these variables are WorldClim, Consensus Land Cover, HydroSHEDS and World soil information (ISRIC) as explained in detail by Domisch et al. (2015). The environmental variables include 19 layers of hydroclimatic variables, 12 layers of land cover variables, 10 layers of soil variables, 4 layers each of elevation and slope, and 1 layer each of stream length and flow accumulation. Additionally, 4 variables namely dissolved oxygen, total alkalinity, town and road proximity was generated through Inverse Distance Weighting (IDW) interpolation method from the water samples (field survey) and published literatures using ArcMap 10.3 (Environmental Systems Research Institute (ESRI) 2016). The coordinate points were snapped to the closest freshwater pixel with a snapping tolerance of 1 km before the interpolation. All the predictor variables were clipped to the area of interest (AOI) for further processing.

Selection of best environmental variables
To achieve parsimonious model, all the predictors with insignificant contribution to the model were removed through multicollinearity test and analysis of percent contributions. Most of the statistical methods face the problem of multicollinearity due to high correlation among the predictor variables (Miller 2010) that can lead to model uncertainties (Graham 2003;Braunisch et al. 2013;Vatcheva et al. 2016). Hence, Merow et al. (2013) recommended minimizing the highly correlated environmental variables because the complex features created by MaxEnt are often already highly correlated. In this study, we used Variance Inflation Factor (VIF) correlation approach to reduce multicollinearity problems using usdm package in R (Naimi and Araújo 2016). In general, a correlation coefficient threshold of > 0.7 is considered to be an appropriate indicator of collinearity (Dormann et al. 2012;Manzoor et al. 2018;Sony et al. 2018;Farrell et al. 2019;Feng et al. 2019) and a VIF greater than 10 is a signal that the model has a collinearity problem (Chatterjee and Hadi 2006). Accordingly, in this study, out of 55 variables, 19 predictor variables with correlation coefficient threshold values of < 0.7 (Engler et al. 2013;Zimmermann et al. 2007) and VIF < 7 were retained for additional analysis (Table 1).

Table 1. Post collinearity variables assessed for inclusion in the model
Then, initially we run MaxEnt model with all 19 predictor variables to examine the variable contributions. Based on the results, 12 variables (elv_01, hydroclim_15, flow_acc_01, lc_avg_01, lc_avg_03, lc_avg_05, lc_avg_07, lc_avg_09, lc_avg_11, soil_avg_09, slope_01, and do) with more than 1% percent contribution were retained. Out of which, 6 best environmental variables (elv_01, hydroclim_15, flow_acc_01, lc_avg_03, soil_avg_09, and do) were selected after consulting available literature on the habitat information of the species and removing the persistent variables of land cover. Finally, we performed 5 replicate runs with 58 presence records and the 6 environmental variables to predict the distribution of T. putitora in the Himalayan river system (Table 2). Table: 2 Variables retained for the final model (shown in bold) with percent contribution and permutation importance (averaged over 5 replicate runs)

Model setup and evaluation
MaxEnt (version 3.4.1) was used to model the potential distribution of T. putitora. Out of a range of traditional and machine-learning modelling approaches, MaxEnt is reported to perform well (Phillips et al. 2006;Peterson et al. 2007;Phillips and Dudík 2008;Radosavlijevic and Anderson 2014;Nimasow et al. 2016) with high predictive accuracy (Moreno et al. 2011;Yang et al. 2013). MaxEnt is a machine-learning algorithm that builds relationships between presence-only records and a set of environmental variables to estimate a target probability distribution of maximum entropy based on the most uniform distribution (Merow et al. 2013). The specific settings used in the MaxEnt model were logistic output format, cross-validate replicate run type, response curves, jackknife measures of variable importance, 10 percentile training presence, and rest kept as default settings. The MaxEnt output in .ASC format was imported to ArcGIS 10.3 and the relative habitat suitability (Merow et al. 2013) ranging from 0 (unsuitable) to 1 (highly suitable) was reclassified into five classes viz. < 0.1 = unsuitable, 0.1 to 0.3 = slightly suitable, 0.3 to 0.5 = moderately suitable, 0.5 to 0.7 = suitable, and > 0.7 = highly suitable. The flowchart of methodology followed in this study is given in Fig. 2.
To test the model predictive accuracy, we used 5-fold cross-validation, where the occurrence data was randomly split into a number of equal-size groups called 'folds', and the models are created leaving out each fold in turn. The overall performance of the model was evaluated using the threshold independent, area under curve (AUC) of the receiver operating characteristic (ROC) curve and True Skill Statistic (TSS). The AUC value ranges from 0.5 to 1.0 where a value of 1.0 indicates perfect fitted model while a value near 0.5 indicates a fit not better than a random (Fielding and Bell 1997;Baldwin 2009). The TSS takes into account both omission and commission errors, and success as a result of random guessing, and ranges from −1 to +1, where +1 indicates perfect agreement and values of zero or less indicate a performance no better than random (Allouche et al. 2006).

3.1Model evaluation
The final model used six best environmental variables (minimum upstream elevation, precipitation seasonality, deciduous riparian broadleaf trees, flow length, dissolved oxygen, and depth to bedrock) representing topographic, flow accumulation, land cover, hydroclimatic, soil and water quality. These variables are known to play a significant role in the spawning, feeding habits, and overall distribution of T. putitora. The averaged model (5 replicate runs) show high training AUC values (> 0.894) across the models and also higher than the test AUC values. The test AUC values, which show the actual model predictive power, were higher than 0.818. The mean AUC of 0.867 with standard deviation of 0.026 and TSS value of 0.756 suggests that the final model is reasonably good. Out of the six environmental variables, minimum upstream elevation (elv_01) showed the highest relative importance of 35.3%, followed by flow length (flow_acc_01) with 21.4%. The importance of rest of the variables ranged between 1.4 to 7.8% only, indicating intermediate to lower contribution in predicting the suitable habitat of T. putitora. The logistic output (response curves) shows a negative relationship of T. putitora with high upstream elevation (peak towards lower values 0 to 1000 m) and deciduous broadleaf trees (peaked towards lower values of 5 to 10%). On the other hand, there was a positive association with flow length (peak towards higher values of 8000 to 30000 m), precipitation seasonality (peak towards higher values), dissolved oxygen (peak towards higher values of 5 to 12 mg/l), and depth to bedrock (peaked towards higher values of 220 to 235 cm). The response curves of the environmental variables are shown in Fig. 3.

Distribution modelling of T. putitora
The final suitability map was calculated in terms of the total river length. Out of the total river length of 60290.56 km, the model predicted 34265.75 km (56.83%) as unsuitable, followed by 12979.36 km (21.53%) as slightly suitable and 6807.67 km (11.29%) as moderaly suitable. The suitable and highly suitable categories constitute 5003.62 km (8.30%) and 1234.16 km (2.05%) respectively (Table 3 & Fig. 4). Countrywise, India represents a total river length of 3652 km, followed by Nepal with 2248 km under suitable and highly suitable categories. Bhutan showed the lowest river length of 338 km under suitable category. Among the Indian states and union territories, the largest river length under suitable and highly suitable categories was found in Arunachal Pradesh with 1626 km, followed by Uttarakhand with 960 km and Himachal   Table 3. Suitable categories of T. putitora in the Himalayan river system A visual examination of the final model shows majority of the upper reaches of the Himalayan rivers as unsuitable for T. putitora due to higher elevation. On contrary, the relatively low altitude reaches, mainly comprising of the Indian states of Himachal Pradesh, Uttarakhand, West Bengal, Sikkim, and Arunachal Pradesh including the countries of Nepal and Bhutan represents the suitable and highly suitable habitat of T. putitora (Fig. 4). Besides, the model also predicted majority of the suitable areas in the river flow length of 8000 to 30000 m. The suitable habitats were mainly predicted in the stems of the major rivers like the Sutlej (Himachal Pradesh), Ganga and Ramganga (Uttarakhand), Trishuli and Kosi (Nepal), Teesta (West Bengal and Sikkim), Puna Tsang Chhu (Bhutan), and Kameng, Subansiri, Siang, Dibang, Lohit, and Noa-Dihing (Arunachal Pradesh). These localties repreprents the main spawning and feeding grounds of T. putitora in the Indian subcontinent due to suitable topographic and hydroclimatic characteristics.
The model predicted only 10.35% of the total river length as suitable habitat of T. putitora in the entire Himalayan river system that point towards its limited area of habitation. The model also predicted about 11% of the total river length under moderately suitable that forms the future potential habitats of the species and needs proper exploration and confirmation. In view of the endangered status and reported declining population in wild stocks, the model results would be helpful in identifying the priority areas for conservation and restoration of T. putitora in the Himalayan river system.

Discussion
The results show the occurrences of T. putitora in larger streams and rivers in low minimum upstream elevation and elongated flow lengths. Besides, low percentage of riparian deciduous broadleaf trees, high precipitation seasonality, high dissolved oxygen, and high depth to bedrock also play low to moderate role in predicting the distribution of T. putitora. Based on the results, the species did not occur in the absolute smallest streams in conformity with its occurrence reports from the major rivers and tributaries of Himalayan river system (Bhatt and Pandit 2015). The model showed a negative relationship of T. putitora with high upstream elevation by restricting the predictions to the elevation range of < 1000 m (with highest possible range up to 2100 m). The results are in agreement with the reported occurrence of the species up to an altitude of 2000 m (Cordington 1946), up to 850 m (Talwar and Jhingran 1991), 600-1200 m (Singh and Kumar 2000), 70 to 1891 m (Nautiyal 2014) in India, and 2100 m in Nepal (Shrestha 1997). The hydroclimatic variables like temperature and precipitation are highly correlated with the elevation. Hence, the high predictive power of the model in the lower altitude areas can also be attributed to the temperature and precipitation requirements of the species. As per previous studies, the spawning grounds of T. putitora are characterized by water temperature varying from 11 to 30.5° C (Joshi 1994;Bhatt et al. 2004) and the feeding grounds in the range of 14 to 22° C (Bhatt et al. 2004).
The river habitat is an important determinant of the condition of fish assemblages in rivers Fialho et al. 2008) and there is a reciprocal relationship between river habitat and the flow length. In simple words, an elongated flow of river offers the possibilities of more diverse river habitats. Flow length is a linear aspect of drainage network and refers to the total length of stream channels in the drainage basin (Horton 1945). The flow length had a positive relationship, with higher values of flow length predicting a high likelihood of T. putitora occurrences. Generally, Tor species prefer fast-flowing rivers and streams with rocky/stony substrate (Nautiyal 2014). Hence, the presence of higher pools and ripples in the lower gradients along elongated stretches of the stream offers suitable habitats for the species.
Riparian forests are an important habitat for freshwater fish worldwide, because forests provide fish with both shelter and food including fruits, seeds, and canopy insects (Goulding 1980;Lowe-McConnell 1987;Gerking 1994). Hence, the moderate contributions of deciduous broadleaf trees in the model predictions could be attributed to the feeding requirements of the species. The rivers along the Himalayan foothills, considered to be the actual geographic range of T. putitora is mostly characterized by deciduous broadleaf forest. Therefore, the decayed and semi-decayed plant matter of the riparian forests forms essential food for the species. The feeding habit of T. putitora on plant matters has been reported in India (Nautiyal and Lal 1984;Kishor et al. 1998), Bangladesh (Alam et al. 2002) and Nepal (Mahaseth 2015).
We also found a positive association of T. putitora with high precipitation seasonality, dissolved oxygen and depth to bedrock. Seasonality plays a vital role in influencing the perseverance of living organisms. Seasonal shifts in climatic conditions influence the availability of resources, which determines the presence or absence of species in an environment at temporal and spatial scale (Dayton 2008). Precipitation seasonality, a measure of the range of annual precipitation plays a key role in breeding and feeding habits of freshwater fishes. Dissolved oxygen is an important parameter in assessing water quality because of its influence on the organisms living within a body of water. According to Bhatt et al. (2004), the dissolved oxygen varies from 5.2 to 12.9 mg/l in feeding grounds while it ranges between 6.4 to 11 mg/l in the spawning grounds of T. putitora (Joshi 1994;Bhatt et al. 2004). The preference of depth to bedrock varies among different freshwater fishes. The model prediction towards the higher values of depth to bedrock may be attributed to the habitat preferences of T. putitora in large water volumes and river beds mostly covered with sand, silt and small boulders (Bhatt et al. 2004).
The findings of wide distribution and restricted area of occupancy of T. putitora is in accordance with the report of IUCN (2021). The results also predicted suitable areas in the rivers of Northeast Himalayas particularly Kameng, Subansiri, Siang, Dibang, Lohit and Noa-Dihing in Arunachal Pradesh that has been reported as unexplored potential habitats of T. putitora (Bhatt and Pandit 2015). Hence, these predictions are important in view of the estimated 50% decline in the past and 80% decline in future wild stocks of the species due to regulations in its river habitats (Sharma et al. 2015;IUCN 2021). As per available literature, habitat alteration by dams, barrages, illegal sand/boulder mining, poaching and indiscriminate fishing of brooders are largely responsible for the endangerment of T. putitora (Pathani 1994;Lakra et al. 2010;Nautiyal 2011;Pandit and Grumbine 2012;Khajuria et al. 2013;Gupta et al. 2014;Sharma et al. 2015). Besides, the life history and ecological traits like migratory habit, low fecundity and delayed sexual maturity have been also suggested to play a vital role in evaluating the vulnerability of the species (Nautiyal and Singh 1989;Shrestha 1997;Ogle 2002;William et al. 2005). Therefore, conservation and restoration strategies should be focused towards the unexplored areas and the critical suitable habitats. According to experts, ex-situ conservation of the species through development of hatcheries would be a sustainable conservation action but less genetic variability has been observed in hatchery stocks compared to its wild populations (Pradhan et al. 2011). So, priority must be given to the potential unexplored habitats by restricting human activities like fishing, extraction of sand, gravels and boulders, and by withdrawing the proposed dams or proper environmental impact assessment before construction of dams for hydropower. As of now, numbers of dams have been proposed in the major rivers of the Himalayan region which may possibly further reduce the already fragmented habitat of T. putitora.

Conclusion
Our study shows the predictive distribution of Tor putitora in the Himalayan river system using MaxEnt modelling technique. We used topographic, hydroclimatic, land cover, soil and water quality variables to evaluate the performance of model. The performance of model was high and reasonably good. The model predicted only 10.35% of the total river length as suitable to highly suitable indicating its limited area of habitation. About 11% of the river length has been predicted as moderately suitable which call for further explorations and confirmation of the species occurrence. Based on the reported declining wild stocks of the species, we reccommed to focuss conservation activities towards unexplored and critical habitats following the model predictions. We also recommend to lessen anthropogenic activities like indiscrimate fishing, sand/boulder mining, and construction of dams over the suitable river habitats of T. putitora. Further, our model appears to be constrained by the vastness of the study area. So, we recommend researchers to carry out distribution modelling at watershed level by integrating the local level environmental factors for more reliable prediction of habitat suitability.

Disclosure statement
The authors declare that they have no potential conflict of interests

Funding
There was no funding for the work.

Data Availability
Data available on request from the authors

Author contributions
Ranjit Mahato and Santoshkumar Abujam prepared the initial draft of the manuscript. Ranjit Mahato prepared the thematic layers, and Dhoni Bushi collected the fish sample. Oyi Dai Nimasow and Gibji Nimasow commented, edited and revised the manuscript by consulting the relevant literature. Debanshu Narayan Das further revised the manuscript after proofreading. Finally, the manuscript was read by all the authors and approved it to submit for possible publication.