A simple regionalization approach as an alternative to obtain rainfall data in a tropical and ungauged catchment

The availability of rainfall information with high spatial resolution is of fundamental importance in many applications in the field of water resources. Commonly, the rainfall data in developing countries are obtained by rain gauge stations. However, many studies show that traditional measures based on rain gauge stations may not reflect the spatial variation of rainfall effectively. Although satellite data have been widely used around the world, when applied to local regions the spatial resolution of these products is too coarse. In this paper, an approach to identify a downscaling method through geostatistical regionalization to improve water resources models with short spatial and temporal scales and with limited rainfall data is presented. Three different models were applied: Cokriging, Inverse Distance Weight (IDW) and Kriging. Statistical parameters such as mean absolute error (MAE) and root mean square error (RMSE) were computed. A cross-validation process showed a better fit for most of the stations using the Cokriging method. The regionalization results were quite comparable with the rain gauge stations data. Although the model outcomes did not improve remarkably, the contribution of this approach may have the potential to provide useful rainfall data at spatial scales shorter than the present resolution.


Introduction
Rainfall is a major component of the hydrological cycle and one of the most important parameters for a range of natural and socioeconomic systems: water resources management, agriculture and forestry, tourism, flood protection among others.
The precipitation estimation at regional or local level is very important, especially when the results are used to determine the behavior of hydro meteorological and rainfall-runoff processes in order to meet the increased flow and increased turbidity levels of the water currents that significantly affect treatment systems and water supply to towns and cities.Recent studies have shown that the spatial resolution greatly affects the results of hydrologic and hydraulic models.Models using distributed rainfall data show better results than those using information from rain gauges (Guo et al., 2004;Smith et al., 2004;Arias-Hidalgo, 2013).
The interpretation of the effects of climate variability on water resources management requires information much smaller than the current resolution of regional or global climate models scales.The variability of sub grid-scale precipitation is usually solved by variableresolution models or by statistical downscaling.Variable-resolution models are often unattractive as its implementation is computationally expensive in very fine resolutions.Large physical at small scales are not often easily parameterized, such models usually do not necessarily produce more accurate predictions.Meanwhile, statistical downscaling models are computationally efficient and particularly attractive in ensemble predictions due to its simplicity (Wilby, 2007).Satellite sensors obtain rainfall information from the reflected radiation at the top of the cloud.The estimated rainfall from satellites provides data for different uses, providing information in areas that are inaccessible to other observing systems such as rain gauges and radar (Ramos, 2013).There are many algorithms based on satellites information such as TRMM (Tropical Rainfall Measuring Mission), which provides important data to validate the ability to simulate and predict the weather with seasonal scale.TRMM is also an important option for regions with a sparse distribution of ground stations, with particular interest to early warning systems (Vernimmen et al., 2012).The use of this type of satellite data is useful for studies of hydrological modelling by comparison with information obtained from ground stations located in certain preset locations.
Comparing annual time scale it produces very good results with respect to finer temporal scales nonetheless the error could increases (Huffman et al., 2010).Previous research using TRMM data can be seen in Immerzeel et al. (2009), Nastos et al. (2013), Hunink et al. (2014), andMartinez-Cano et al. (2014), demonstrating the utility of such information in areas with lack of direct observations for not only hydrological and meteorological systems but also for water treatment and distribution systems.According to Timbal et al., (2009), downscaling models are based on the premise that the regional climate is conditioned by two factors: the state of the largescale climate and local physiographic features.
From this point of view, rainfall information could be found by applying a model that relates the large-scale climate variables to regional and local variables.The large-scale information obtained from satellite imagery, will be of great help, by using the downscaling model to estimate the characteristics of regional and locally rainfall.For this purpose there are models such as dynamic downscaling, statistical downscaling and geostatistical regionalization models.The wide range of downscaling techniques, has led to an increasing number of model comparisons using generic data sets and diagnoses (Wilby, 2007).The quantification of goodness by different downscaling methods determines the importance of representing rainfall information for decisionmaking (Diaz et al., 2006).This process is of utmost importance given that it is possible to assign a value to an unknown georeferenced point in a map, starting from known near values to that point.This research focuses on a framework to identify a regionalization method through geostatistical regionalization to improve water resources models with short spatial and temporal scales and with limited rainfall data.To this purpose, rainfall data from satellite is applied in a small tropical urban catchment trying to reduce bias errors that may occur when comparing this information with rainfall data obtained from meteorological stations.

Methods
An approach called geostatistical regionalization as an alternative to obtain rainfall data in a tropical and ungaged catchment is presented here.It consists of four parts: (1) information analysis of satellite rainfall generated by the Tropical Rainfall Measuring Mission (TRMM).(2) Downscaling through regionalization.(3) Satellite data validation with ground stations and 4) determination of the best downscaling method by comparing geostatistical regionalization models.

Satellite rainfall data
The data used for determining the best geostatistical model correspond to the satellite rainfall data 3B42 type with a spatial resolution of 32 km and a temporal resolution of 3 hours generated by the TRMM.Satellite data was compared with rainfall data measured from seven ground stations.For data processing an application called GetWeb was used to capture information from the web.Get Web is a software platform developed by F.IMM (available from http://www.fimm.com, Accessed in Feb 2015) for the complete management of all back-office support of meter photo-reading, census and measurers replacement.No installation is needed, just a browser web.It is integrated with mapmakers such as Google Maps, not merely by map visualization, but making it possible to interact actively with it.The visualization of data used as graphics and reports, allows the operator to keep all the work advancement under control immediately.It allows the export of every single datum, statistic and report toward the most used office automation.The data was transformed into ARCINFO format using HDF2GRID.amlapplication program based on computational language AML.AML is composed of modules for interaction with the system, the manipulator control and real-time tasks incorporates features of APL, Pascal and Lisp languages.A unique feature of AML is the ability to handle data aggregation which facilitates the treatment of vectors and different reference systems (Ollero, 2001).

Downscaling through regionalization
The regionalization was performed by applying geostatistical models and cross-validation for three different methods: cokriging, Inverse Distance Weight (IDW) and kriging (Olivera, 2010).Through the cross-validation procedure the most appropriate or best-fit method was defined in order to perform geostatistical model according to the obtained statistical parameters for each method.The cross-validation procedure was carried out with satellite rainfall data obtained for each of the ground stations located in the study area.Values of statistical parameters for the mean absolute error (MAE) and the root mean square error (RMSE) were obtained, which show the correlation between pairs of data and the calculated measure average differences between the satellite and rain gauge values.By using geostatistical models is possible to introduce several secondary variables to be considered into the regionalization process.It is well known that certain variables resulting from topography and geography have influence in rainfall.Therefore, taking into account secondary variables in the regionalization process better results are expected.It is feasible to introduce up to three secondary variables in a geostatistical model such as cokriging.However, as the number of secondary variables increases the model becomes more complex and requires more computational effort.For the regionalization process elevation as a secondary variable was applied so that the geostatistical procedure was simplified while the model applied was enhanced (Portalés, 2008).
The regionalization process was performed by taking the centroid of each pixel.The value of satellite rainfall was assigned to this point generating intermediate interpolated values.Those values were used as input data to the geostatistical models.By applying geostatistical analyst tools along with mapping information and digital elevation model, rainfall images of the study area with a new resolution of 460 meters and 30 pixels were obtained.The distributed rainfall data between two variables (rainfall and elevation) were acquired at local level.The process let to regionalize all points on a grid included in the catchment in an automated manner by means of map based macros.Secondary variable elevation above sea level was applied due to its influence on the rainfall occurrence owing to orographic effects.An illustration of the regionalization framework is given in Figure 1.

Satellite data validation with ground data
The regionalization procedure explained in previous section could let to determine the possibility of using the satellite data in different hydrological processes in ungauged catchments considering the obtained values show a skilful correlation between satellite and ground data.However, according to the method applied, data coming from satellite may have differences with respect to the ground data so that its correction or adjustment is needed.An hourly bias correction for the satellite rainfall 3B42 research data has been adopted previously by other researches as well (Bell & Kundu, 2003;Vernimmen et al., 2012).The analysis of rainfall data set of ground stations and satellite information for the selected days revealed potential probabilistic similarities between datasets and facilitate the use of statistical evidence.However, the location of the gauges and the centers of the TRMM grid information often not match.As a result the TRMM data must be corrected with respect to the rain gauge information beyond monthly resolution and in general at rain gauge locations using daily timescales.Researchers have found low correlations between the rain gauge and satellite rainfall data (Arias-Hidalgo 2013).Therefore, the average monthly rainfall values for the study period measured by the rain gauge is compared with respect to the TRMM data interpolated.The approach aims to determine the correlation between pairs of data and adjust the satellite information regarding the information of ground stations.The bias correction factor between the data pairs was determined let to define the relationship between both sources of information applying the equations 1 and 2: Where (Øt) is the predicted value of cell i. (Øtobs) is the observed value of cell i. and (N) is the number of analyzed values.

Bias correction:
Root Mean Square Error:  The satellite information features obtained include; grid data with 0.25° x 0.25°, 50°NS, lat.180° EW Long, HDF format data and 3B42 data rainfall each 3 hours.The rainfall data used in this experiment correspond to the period from 15 February to 11 April 2009.The motivation of using this data period was for modelling integration purposes (Martinez-Cano et al., 2014) due to the necessity to evaluate an existing Early Warning System (EWS) prototype which estimates the risk level of a critical pollution event that may exceed levels of purification in the Water Treatment Plant (WTP) located in the SDS.The operation of the WTP was stopped during this period of rainfall and nearly half of these occasions the intake was closed due to concentrations of oxygen demand (OD) of about 2 mg L-1, as an indicator of high organic matter pollution in the Cauca river (Velez et al., 2014).During February and April 2009 there was a simultaneous available satellite data, ground station information and data records of closed WTP due to high pollution load.

Models performance comparison
Rainfall images with spatial resolution of 32 km and temporal resolution of 3 hours were obtained.Monthly rainfalls from rain gauges stations and the satellite rainfall data 3B42, averaged over the time span, were compared at their respective measurements points.The inverse distance weight (IDW), cokriging and kriging models results were analyzed quantitatively between them by using mean absolute error (MAE) and root mean square error (RMSE) in a cross-validation procedure.Correlation coefficient (r) and coefficient of deter-  Adopting the geostatistical model cokriging and using elevation as a secondary variable letting better results due to the influence of this parameter in rainfall, an average spatial distribution of 3 hours rainfall with a new size cell of 460m and 30 pixels as a result of the regionalization is shown in Figure 3.A decreasing pattern moving principally from west to north-east was captured by the TRMM-based map from 00:00 hrs to 03:00 hrs of the measurement.At 6:00 hrs the decreasing rainfall pattern starts moving from east to northwest with rainfall values between 3.80 mm and 2.70 mm.At 09:00 hrs with the rainfall pattern moving in the same direction (east to northwest) the values of rainfall measurements start to decrease between 2.43mm and 1.16 mm.At 21:00 hrs the rainfall pattern return to move from west to north-east as initially did.It is noted that the right part of the catchment shows low rainfall values at 00:00 hrs and 03:00 hrs and high rainfall values from 06:00 hrs to 18:00 hrs.This fact corroborates the concerns about possible high uncertainties that may be associated with rainfall estimation across the lower part of the Catchment (Paiva et al., 2011;Arias-Hidalgo 2013).
Figure 4 presents the comparison between distributed satellite rainfall in millimeters with respect to the rainfall obtained using cokriging model for 16 pixels of the image occurred on March 29, 2009 at 0:00 hrs.The results show an appropriate cokriging method for the regionalization process as it uses as a secondary variable elevation not only for better results but al so to simplify and enhance the process modelling.This study follows the established valuation method (see Chung, 2013) in which for the downscaling rainfall data using TRMM finescale models and interpolation procedures, it is possible to use auxiliary environmental variables such as topography, the Normalized Vegetation Index (NDVI) and the elevation, improving the reduction scale process and making the process more dynamic.
With the cokriging model a grid rainfall data distributed over the upper reaches of SDS, with rainfall values for each of the pixels centroids as a result of the regionalization was acquired.This product can be used to obtain areal rainfall values for calculations of water balance and hydrological modelling among others.relative humidity and air circulation resulting wet.Therefore, rainfall increases along mountain edges.The size and effect  of this mechanism depends on the orientation of a ridge and its ability to decrease the speed of movement of a storm, causing an elevation of the air mass.Further it submits that the elevation also has an effect on the increase in rainfall as it rises above the sea.These considerations reinforce the results using geostatistical interpolation model Cokriging, using an auxiliary variable as elevation, which improves the modeling, simplifying the process for better results.

Validation of satellite rainfall data with ground stations
Table 2 shows the comparison results between uncorrected TRMM values obtained applying the cokriging model and corrected TRMM values using the Bias method for the study period February to April 2009.Although it is possible to infer from results in Table 2 the good correlation between pairs of data with R2 = 0.77 on average, Figure 5 illustrates a graphical Bias correction at the hourly timescale with the correlation at this spot with R2 =0.63.As could have been expected from the differences, the bias correctors constituted a representative interval for most of the catchment domain, with the exception of those stations situated in the uppermost portions of the catchment.
There are often shifts in the location of the rain gauges and TRMM grid centers, so the information does not match data pairs (see Arias, 2012).In this case, data from TRMM (in grid cells) must be estimated in places of gauges, it means, it is necessary to match them using a correction procedure.Therefore, the average rainfall for the study period are measured in place of each gauge and compared with TRMM data by performing a bias correction process.

Discussion
The process of geostatistical interpolation using 3 models of regionalization, defining basic statistics as mean absolute error (MAE) and Mean Square Error (RMSE) show that the cokriging model presents better performance compared with IDW and kriging models.The cokriging model is able to reduce the RMSE values for most of the stations.However, as it was stated in item 3.1 relatively high MAE and RMSE values are observed attributable to the fact that in the regionalization procedure assigned a small amount of rain value to those cells with zero rainfall value so the probability of obtaining high error values increased.This also have been found in performance comparison between radar rainfall data and one hour forecast rainfall coming from data driven models where com-  paratively high values are observed in normalization procedures assigned a small amount of rain value to those cells with zero rainfall value obtaining increased false alarms (see Hong Li et al., 2013).
Bias correction factors were computed and adopting as a straightforward procedure.This scheme showed a basic yet effective way for correcting the bias of the TRMM data at catchment scale using local calibration points by rainfall ground stations instead of the global validating ground spots utilized by NASA.Correction of bias, can determine whether you can use the satellite information in the process of hydrological modelling in catchments, considering that in all stations values obtained showed a skilful correlation between pairs of data.Comparatively, by means of rain gauge hyetographs, the biascorrected monthly TMPA-3B42R data were disaggregated to daily resolution.These synthetic time series were inserted in a hydrological model to complement the available rain gauge data to assess the model performance, the results were quite comparable with those using only the rain gauge data (Vernimmen et al., 2012).
Due to the low spatial resolution of rainfall data obtained from satellite sensors, it is yet losing neces-sary information about the actual characteristics of the regional order, development and application of methodologies for climate regionalization or downscaling is still needed to understand the behavior of climate locally.It is also required to work with longer period of time as this study two months were analyzed, considering that for this period there was a simultaneous available satellite information and data records of high pollution load to evaluate an Early Warning System (EWS) prototype.Based on the cross-validation results, it is possible to apply the correction spatially using longer data periods, for an entire region not only where the stations are located but in the neighbouring areas where gauges data are lacking.See for instance Heidenger et al., (2012) where with longer period, it was feasible to transfer the detail from a gauged station to TRMM tendencies because it provides the mechanism to separate the trend from the noise and also allows one to view a signal at different time resolutions or levels of decomposition, maintaining the same distributional properties of the measured events.et al., 2007).
According to Segond et al., 2006 it is supposed spatial dependence between the sites.Although it is beyond the scope of this work, into the approach evaluation it could be extended to a larger area and better instrumented catchment.The differences presented between pairs of data can be related to the different environments as altitudinal aspects, differences in the information obtained during the dry season regarding the information processed during the wet season.According to Vernimmen, et al., (2012) studies in different regions around the world have demonstrated the underestimation of rainfall in dry season and an overestimation in the wet season.
It is possible to acquire a better performance in the regionalization process by means of dynamic models or other statistical models.Improved several metrics such as entropy difference of corrected TRMM and gauged rainfall less than 5% has been found by applying a Multi-resolution analysis (see for instance Heidinger et al., 2012).
However, dynamic models don't let to obtain information at the local level if most of these models have around 150 to 300 km resolution (Huffman et al., 2010).The sophistication and capability of both methods are very similar.Nevertheless, the procedures used still have limitations to obtain reliable results regarding the reduction of scale thus limiting the use of a particular technique.Models based on nonlinear artificial neural networks comparatively have been best at modelling the interannual variability of the indices.Nonetheless, they still underestimate extremes (Haylock et al., 2006).

Conclusions
The satellite data specifically TRMM turn out to be a good complement for obtaining rainfall for a short-term respond urban catchment.The proposed framework contains current approaches such as the comparison with ground data.A method for correcting the bias TRMM data at the catchment scale was also applied.The regionalization process of climate data constitutes an important aspect for obtaining high resolution data at regional and local level necessary for most applications in hydrological modelling and to define a better understanding of the local climate variability in tropical regions.
Satellite-survey rainfall is not defined to administrative boundaries, for this reason, It provides access to rainfall data of cross-border catchments and can be capture freely.Among different sources of satellite rainfall data the TRMM, which provides access to near real time 3-hourly rainfall data (known as 3B42) for a large part of the globe, is a popular and trusted source.In a general principle, merging of multiple sources of information is known as 'data merging', and is believed to provide valuable advantages over the use of only a single source data.The uncertainty related to regionalization and bias-correction of the climate simulation must be taken into account to better estimate the impact of climate change.

Figure 1 .
Figure 1.Illustration of the geostatistical regionalization framework

Figure 4 .
Figure 4. Distributed rainfall comparison obtained between satellite and cokriging model data for 29 March 2009 in all 7 stations.

Figure 3 .
Figure 3. Distributed TRMM rainfall in the SDS for 29 March 2009.

Table 1 .
Performance comparison between geostatistical models (Cross validation procedure).

Table 2 .
Bias correction, TRMM vs. ground data rainfall based on hourly data.