Abstract
Understanding of rainfall is an important issue for Uttarakhand, India which having varied topography and due to that extreme rainfall causes quick runoff which warns structural and functional safety of large structures and other natural resources. In this study, an attempt has been made to determine the bestfit distribution of the annual series of rainfall data for the period of 1991–2002 of 13 districts of Uttarakhand. A bestfit distribution such as Chisquared, Chisquared (2P), exponential, exponential (2P), gamma, gamma (3P), gen. extreme value (GEV), logPearson 3, Weibull, Weibull (3P) distributions was applied. Comparisons of best distributions were based on the use of goodnessoffit tests such as Kolmogorov–Smirnov, Anderson–Darling, and Chi squared. Results showed that the Weibull distribution performed the best with 46% of the total district, while the second best distribution was Chi squared (2P) and logPearson. The results of this study would be useful to the water resource engineers, policy makers and planners for the agricultural development and conservation of natural resources of Uttarakhand.
Introduction
Rainfall is considered as the main source of domestic water for living as well as for agriculture in the Uttarakhand State. The state has varied topography, leading to great variation in rainfall spatially and temporally. An analysis of the rainfall series of the state would enhance the management of water resources as well as its optimum use. One of the challenging tasks with rainfall data is to deal with interpreting past records of rainfall events in terms of future probabilities of occurrence. Therefore, the understanding of the rainfall distribution that causes flood might play an important role for the sustainable development and conservation of natural resources of the state. Estimating a statistical distribution which gives a better fit to annual rainfall has long been a concerning topic for hydrologists, meteorologists, and other water resource personnel. The understanding of rainfall distribution is important for stochastic modeling, rainfall frequency analysis, and rainfall trend analysis. The aim of such a study is not so much to explore the properties of rainfall, but to use the rainfall sequences as inputs for another modeling to understand the hydrologic processes (Buishand 1978). Distribution fitting is a way of selecting a statistical distribution that best fits available records. It is also possible to calculate return periods using various probability distributions (Upadhaya and Singh 1998).
In Costa Rica, normal distribution was the bestfitted probability distribution for annual rainfall (Waylen et al. 1996). According to Abdullah and AlMazroui (1998), the gamma distribution was the best for annual rainfall of Saudi Arabia. Gamma distribution also has been applied in Africa for monitoring drought (Husak et al. 2006). Rainfall distribution was also well fitted by logPearson type III distribution in Texas (Salami 2004) and (Lee 2005) in China. Naghavi and Yu (1995) had chosen the general extreme value (GEV) distribution for Louisiana and similar distribution was opted by (Pilon et al. 1991) in Ontario and (Alahmadi et al. 2014) in Saudi Arabia. Using Kappa distribution (Park and Jung 2002) generated rainfall quantile maps for Korea. Distributions by the Pearson and logPearson method were used in Golestan in Iran (Osati et al. 2010). Alghazali and Alawadi (2014) found that there was no suitable distribution across Iraq to describe rainfall. The annual rainfall in Sudan was best fitted by normal and gamma probability distribution (Mohamed and Ibrahim 2015). A Libyan monthly rainfall distribution was best fitted by gamma probability distribution function (Sen and Eljadid 1999). Sharma and Singh (2010) used the daily rainfall series and found that lognormal and gamma distribution was best fitted. Tao et al. (2002) propose a systematic procedure to compare the performance of different probability distributions. GEV distribution provides a good fit to the monthly rainfall data in Bangladesh (Ghosh et al. 2016). Four probability distributions: normal, lognormal, logPearson type III and Gumbel were applied over Pakistan to find the bestfit probability distribution of yearly rainfall recorded at 24 h (Amin et al. 2016).
The choice of a bestfit distribution also has been applied to a discharge series in the USA (Benson 1968; Vogel et al. 1993), the UK (NERC 1975), Australia (McMahon and Srikanthan 1981), and Turkey (Haktanir 1991). A review of the selection of the best distribution was given by (Curmane 1989). The rainfall runoff behavior of the Tagwai dam in Nigeria was fitted by various probability distributions and the normal distribution for the yearly daily rainfall, and logGumbel distribution was the most appropriate for the prediction of the yearly maximum daily runoff (Olumide et al. 2013). Phien and Ajirajah (1984) assessed the logPearson III distribution to the flood and maximum rainfall series. Frequency analysis of consecutive day’s rainfall in Rajasthan, India, was studied (Bhakar et al. 2006) and gamma distribution was found to be the best fit for the region and the corresponding return period was estimated using the gamma function. Sabarish et al. (2017) found that logPearson III distribution is a bestfit probability distribution for 1day maximum rainfall over the southern part of India. Different probability distributions have mixed results at different locations and times (Lairenjam et al. 2016). Lognormal and Gumbel EV1 distribution were adopted for discharge in the Uttarakhand region (Kamal et al. 2016).
The various distributions are mainly applied and outcomes from it help to determine the risk, uncertainty and money loss. Tao et al. (2002) stated that various probability distributions have been developed to study the distribution of rainfall. However, the choice of a best suitable distribution is still a major challenge in hydrologic practice, since there is no general agreement as to which distribution should be used for the annual rainfall series. The selection of an appropriate distribution depends mainly on the characteristics of available rainfall data at the particular site. Hence, it is necessary to evaluate many available distributions in order to evaluate best suitable distribution that could offer true extreme rainfall. This study aimed to determine the bestfit distribution for the annual rainfall data in different districts of Uttarakhand and to evaluate their parameters.
Data and analysis
The monthly rainfall data for all districts of Uttarakhand were obtained for the period 1901–2002 from the meteorological data tool of website IWP (2015), (http://www.indiawaterportal.org/met_data/) Uttarakhand State is mainly known for two different mountainous regions, namely Kumaon and Garhwal. Most of the parts of the state is under forest cover and main rivers like Ganga and Yamuna originate from this state. All 13 districts of the state were selected for this study, which includes Almora, Bageshwar, Chamoli, Champawat, Dehradun, Haridwar, Nainital, Pauri, Pithoragarh, Rudraprayag, Tehri, Udham Singh Nagar and Uttarkashi as shown in Fig. 1. The basic characteristics such as population and geographical area of the state are given in Table 1.
From Table 2, it can be inferred that the annual rainfall of the Uttarakhand State is spatial and there is a wide variation in annual rainfall amount. The average maximum annual rainfall of 2426.77 mm occurred in Champawat, whereas the lowest average annual rainfall 406.70 mm occurred in Haridwar. The time series graph of different districts shows the positive or negative correlation. The time series shows two different peaks as shown in Fig. 2, the year 1936 and 1980, which are characterized by different statistical behaviors.
Methodology One of the major concerns in rainfall record is with interpreting past rainfall data in terms of future probabilities of occurrences. A large number of probability distribution methods have been applied in different regions and found to be useful for rainfall distribution. The bestfit probability distribution in the present case was evaluated using the following procedure.
Step I: Fitting the probability distribution The probability distributions, viz. chi squared, chi squared (2P), exponential, exponential (2P), gamma, gamma (3P), gen. extreme value (GEV), logPearson 3, Weibull, Weibull (3P), were applied to find out the bestfit probability distribution.
Chisquared distribution The Chisquare distribution is given by:
where the variable \(x \ge 0\) and the parameter \(n\), the number of degrees of freedom, is a positive integer.
Exponential distribution The exponential distribution is given by:
where the variable \(x\) as well as the parameter \(\alpha\) is a positive real quantity.
Gamma distribution The gamma distribution is given by:
where the parameters a and b are positive real quantities as is the variable x.
Generalized extreme value (GEV) distribution The class of GEV distributions is very flexible with the tail shape parameter ξ (and hence the tail index defined as \(\alpha = \xi^{  1}\)) controlling the shape and size of the tails:
The standardized GEV distribution, in the form of von Mises (1936) incorporates a location parameter \(\mu\) and a scale parameter σ, in addition to the tail shape parameter, ξ, and is given by:
LogPearson type 3 distribution The logPearson 3 distribution is complicated, as it has two interacting shape parameters (Griffis and Stedinger 2007). Similar to GEV, it uses three parameters, location (µ), scale (σ) and shape (γ). A problem arises with LP3 as it has a tendency to give low upper bounds of the precipitation magnitudes, which is undesirable (Curmane 1989):
where
Weibull distribution The Weibull distribution is given by:
where the variable x and the parameters η and σ are all positive real numbers.
Step II: Testing the goodness of fit The goodnessoffit tests, namely, Kolmogorov–Smirnov, Anderson–Darling, and Chisquare test were used at 5% significance level for the selection of the bestfit distribution. The bestfitted distribution is selected based on the minimum error produced, which is evaluated by the following techniques:

(a)
Kolmogorov–Smirnov test (K–S)
The Kolmogorov–Smirnov (K–S) test is a goodnessoffit statistic that compares an empirical distribution function \((F_{x} )\), with a specified distribution function \((F_{y} )\). Many times test is used as an alternative to the Chisquare goodnessoffit test. The Kolmogorov–Smirnov statistic (D) can be computed as:
which measures the distance between the empirical distribution function \(F_{x}\) and the specified distribution \(F_{y}\). Obviously, a large difference indicates an inconsistency between the observed data and the statistical model.

(b)
Anderson–Darling test (A–D)
The Anderson–Darling (A–D) test was introduced by Anderson and Darling to place more weight or discriminating power at the tails of the distribution. This can be important when the tails of the selected theoretical distribution are of practical significance. The test statistic (AD) is defined as:
where \(\left\{ {x_{(1)} < \cdots < x_{(n)} } \right\}\) is the ordered (from smallest to the largest element) sample of size n, and \(F(x)\) is the underlying theoretical cumulative distribution to which the sample is compared. The null hypothesis that \(\left\{ {x_{(1)} < \cdots < x_{(n)} } \right\}\) comes from the underlying distribution \(F(x)\) is rejected if AD is larger than the critical value \({\text{AD}}_{\alpha }\) at a given significance level \((\alpha )\).

(c)
Chisquare (χ ^{2}) test
It is a technique that checks if a specific distribution of a certain observed event’s frequency in a sample is suitable for that sample or not. Using O to define “observed count” and E to define “expected count”, the Chisquare test statistic is calculated by:
The null hypothesis states that there is no significant difference between the expected and observed frequencies. The alternative hypothesis states that they are different.
Step III: Identification of bestfit probability distribution The goodnessoffit test mentioned above was fitted to the rainfall data of the study area. The test statistic was computed and tested at (α = 0.05) level of significance. Accordingly, the ranking of different probability distributions was marked based on minimum test statistic value. The description of various probability distribution functions regarding probability density function, range and parameters are as shown in Tables 3 and 4.
Results and discussion
Analysis of rainfall data
Analysis of rainfall data plays an important role for any water resource planning as well as for hydrological modeling. The mean monthly rainfall data for all districts of Uttarakhand for 102 years (1901–2002) were used for the present study. Figure 3 displays the annual rainfall behavior recorded for whole Uttarakhand for the duration 1901–2002. The average annual rainfall recorded is 1069 mm for the whole duration. During this period, the highest amount of rainfall was about 1982.15 mm in 1936, whereas the lowest amount of rainfall recorded was about 559.98 mm during 1987. The dark line in the figure represents the annual average rainfall. If the annual rainfall in a year departs from the average annual rainfall by greater than or equal to 25%, then it is declared as a drought (meteorological drought) year (Subramanya 2008). On the basis of 25% departure from the average annual rainfall, 30.4% times there were dry years. There was sufficient rainfall from July to September to meet evapotranspiration demand and vice versa from the October to June.
To identify the seasonal rainfall distribution, the whole year was divided into three periods, namely monsoon (June–September), postmonsoon (October–February) and premonsoon (March–May). Figure 4 shows the seasonal variation of the rainfall. This reveals that the area receives about 82% of the total annual rainfall during the monsoon season, 10% during the postmonsoon season, and 8% during the premonsoon season. It indicates that more than 82% of rainfall occurs in the monsoon season, and in the remaining 8 months the crop suffers from moisture stress. Therefore, it is necessary to predict the expected rainfall to design a water conservation system. Based on the drought criteria, the years 1903, 1918, 1941, 1944, 1965, 1974, 1979, 1987, 1991 and 2001 can be characterized as drought years for Uttarakhand State.
Probability distribution
To understand the best distribution of monthly rainfall, data for all districts of Uttarakhand for 102 years (1901–2002) were analyzed. The probability analysis of monthly rainfall series data was carried out. A bestfit distribution, such as chi squared, chi squared (2P), exponential, exponential (2P), gamma, gamma (3P), gen. extreme value (GEV), logPearson 3, Weibull, Weibull (3P), was applied. Table 3 shows the distribution parameters for a different distribution. To obtain the bestfit distribution to this rainfall series, goodnessoffit tests such as Kolmogorov–Smirnov, Anderson–Darling, and Chi squared were applied. The assessment of the best probability distribution was based on the total rank obtained from all the tests. Ranks ranging from zero to ten (0–10) are given to each distribution model based on the criteria that the distribution(s) with the highest total score is or are chosen as the best distribution model(s) for the data of a particular district. According to the goodnessoffit test, it was found that Weibull distribution best fitted the rainfall distribution for Almora, Bageshwar, Nainital and Udham Singh Nagar districts, Chisquared (2P) distribution best fit for Chamoli, Champawat and Haridwar, Gamma (3P) distribution best fitted Dehradun and Pauri Garhwal, logPearson 3 best fitted Pithoragarh and Tehri Garhwal and Weibull (3P) distribution best fitted Uttarkashi. The general extreme value distribution was best fitted during the monsoon (June–September). This best distribution is used to define the risk and uncertainty associated with modeling and planning of water resources. It also allows us to improve valid models which could protect us from time and economy loss.
Conclusion
The Uttarakhand State is facing the problem of quick translation of rainfall to surface runoff because of slope and faces the problems of landslides. Thus to cope up with these issues, an organized calculation of probability distribution to understand and selection of the bestfit probability distribution on an annual series of rainfall data for a period of 1991–2002 of 13 districts of Uttarakhand was made. The choice of best probability distribution could also be used to influence decisions relating to local economics and hydrologic safety systems. Annual rainfall series of all districts of the state were fitted by Chisquared, Chisquared (2P), exponential, exponential (2P), gamma, gamma (3P), gen. extreme value (GEV), logPearson 3, Weibull, Weibull (3P) distributions and comparisons of best distributions were done based on the use of goodnessoffit tests such as Kolmogorov–Smirnov, Anderson–Darling, and Chi squared.
The goodnessoffit test analysis indicated that Weibull, Weibull (3P), Chisquared (2P), gamma, and logPearson (3P) distributions were suitable for 31, 15, 24, 15, and 15% of the stations, respectively. This study could provide a basis for choosing the best probability distribution for individual districts and corresponding distribution parameters. Further to this seasonal rainfall distribution, it reveals that the area receives about 82% of the total annual rainfall during the monsoon season, 10% during the postmonsoon season, and 8% during the premonsoon season. This preliminary result will help the water resource planner in hydrological modeling and the policy maker to frame general guidelines for the best use of rainfall for Uttarakhand.
References
Abdullah MA, ALMazroui MA (1998) Climatological study of the southwestern region of Saudi Arabia. I. Rainfall analysis. Clim Res 9(3):213–223
Alahmadi F, Abd Rahman N, Abdul Razzak M (2014) Evaluation of the best fit distribution for partial duration series of daily rainfall in Madinah, Western Saudi Arabia. In: Evolving water resources system: understanding, predicting and managing water society interactions proceedings of ICWRS, Bologna, Italy
Alghazali NO, Alawadi DA (2014) Fitting statistical distributions of monthly rainfall for some Iraqi stations. Civ Environ Res 6(6):40–46
Amin MT, Rizwan M, Alazba AA (2016) A bestfit probability distribution for the estimation of rainfall in northern regions of Pakistan. Open Life Sci 11(1):432–440
Benson MA (1968) Uniform flood frequency estimating methods for federal agencies. Water Resour Res 4(5):891–908
Bhakar SR, Bansal AN, Chhajed N, Purohit RC (2006) Frequency analysis of consecutive days maximum rainfall at Banswara. ARPN J Eng Appl Sci 1(3):64–67
Buishand TA (1978) Some remarks on the use of daily rainfall models. J Hydrol 36(3):295–308
Curmane C (1989) Statistical distributions for flood frequency analysis. Operational Hydrology Report no. 33, WMO no. 718, World Meteorological Organization, Geneva, Switzerland
Griffis VW, Stedinger JR (2007) LogPearson Type 3 distribution and its application in flood frequency analysis. I: Distribution characteristics. J Hydrol Eng 12(5):482–491
Ghosh S, Manindra KR, Soma CB (2016) Determination of the best fit probability distribution for monthly rainfall data in Bangladesh. Am J Math Stat 6(4):170–174
Haktanir T (1991) Statistical modeling of maximum flows in Turkish rivers. Hydrol Sci J 36(4):367–389
Husak GJ, Michaelsen J, Funk C (2006) Use of the gamma distribution to represent monthly rainfall in Africa for drought monitoring applications. Int J Climatol 27(1):935–944
IWP (2015). http://www.indiawaterportal.org/met_data/. Accessed 22 March 2016
Kamal V, Saumitra M, Singh P, Sen R, Vishwakarma CA, Sajadi P, Asthana H, Rena V (2016) Flood frequency analysis of Ganga river at Haridwar and Garhmukteshwar. Appl Water Sci 7(4):1979–1986
Lairenjam C, Huidrom S, Bandyopadhyay A, Bhadra A (2016) Assessment of probability distribution of rainfall of North East Region (NER) of India. J Res Environ Earth Sci 2(9):12–18
Lee C (2005) Application of rainfall frequency analysis on studying rainfall distribution characteristics of ChiaNan plain area in Southern Taiwan. J Crop Environ Bioinform 2:31–38
McMahon TA, Srikanthan R (1981) LogPearson type 3 distributionis it applicable to flood frequency analysis of Australian streams? J Hydrol 52(1):139–147
Mohamed TM, Ibrahim AAA (2015) Fitting probability distributions of annual rainfall in Sudan. J Sci Technol 17(2):34–39
Naghavi B, Yu FX (1995) Regional frequency analysis of extreme precipitation in Louisiana. J Hydraul Eng 121(11):819–827
NERC (1975) Flood Studies Report. Natural Environment Research Council, London
Olumide BA, Saidu M, Oluwasesan A (2013) Evaluation of best fit probability distribution models for the prediction of rainfall and runoff volume (Case Study Tagwai Dam, MinnaNigeria). Int J Eng Technol 3(2):94–98
Osati K, Mohammed M, Karimi B, Naghi S, Mobaraki J (2010) Determining suitable probability distribution models for annual precipitation data (A case study of Mazandaran and Golestan provinces). J Sustain Dev 3(1):159–168
Park JS, Jung HS (2002) Modeling Korean extreme rainfall using a Kappa distribution and maximum likelihood estimate. Theor Appl Climatol 72(1):55–64
Phien HN, Ajirajah TJ (1984) Applications of the logPearson type3 distributions in hydrology. J Hydrol 73(3):359–372
Pilon PJ, Adamowski K, Alila Y (1991) Regional analysis of annual maxima precipitation using Lmoments. Atmos Res 27(1):81–92
Sabarish RM, Narasimhan R, Chandhru AR, Suribabu CR, Sudharsan J, Nithiyanantham S (2017) Probability analysis for consecutiveday maximum rainfall for Tiruchirapalli City (south India, Asia). Appl Water Sci 7(2):1033–1042
Salami AW (2004) Prediction of the annual flow regime along Asa River using probability distribution models. AMSE Period 65(2):41–56
Sen Z, Eljadid AG (1999) Rainfall distribution function for Libya and rainfall prediction. Hydrol Sci J 44(5):665–680
Sharma MA, Singh JB (2010) Use of probability distribution in rainfall analysis. N Y Sci J 3(9):40–49
Subramanya K (2008) Engineering hydrology. Tata McGrawHill Publishing Company Limited, New Delhi, pp 155–160
Tao DQ, Nguyen VT, Bourque A (2002) On selection of probability distributions for representing extreme precipitations in Southern Quebec. Ann Conf Can Soc Civ Eng 1–8.
Upadhaya A, Singh SR (1998) Estimation of consecutive days maximum rainfall by various methods and their comparison. Indian J Soil Conser 26(2):193–201
von MR (1936) ‘La distribution de la plus grande de n valeurs’. Rev Math de l’Union Interbalkanique 1:141–160
Vogel RM, Thomas WO, McMahon TA (1993) Floodflow frequency model selection in the Southwestern United States. J Water Resour Plan Manag (ASCE) 119(3):353–366
Waylen PR, Qusesada ME, Caviedes CN (1996) Temporal and spatial variability of annual precipitation in Costa Rica and Southern Oscillation. Int J Climatol 14(2):173–193
Acknowledgements
The authors would like to thank the anonymous referees for contributing insightful remarks and constructive suggestions, which led to a substantially improved manuscript. The authors would like to thank Shaktibala for valuable inputs.
Author information
Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Kumar, V., Shanu & Jahangeer Statistical distribution of rainfall in Uttarakhand, India. Appl Water Sci 7, 4765–4776 (2017). https://doi.org/10.1007/s1320101705865
Received:
Accepted:
Published:
Issue Date:
Keywords
 Bestfit distribution
 Anderson–Darling
 Chi square
 Kolmogorov–Smirnov