Reservoir Project
Last year I worked for a company that I cannot name because of customer policy.
For the same reason, I cannot post any results or visual representations and you cannot find the code on my GitHub page related to the project.
The company owns a reservoir in Italy and wanted to obtain useful information to manage it as effectively as possible.
The dataset I received was composed of a few columns: the Date, the rain fell in five different zones, the temperature registered in only one zone, and two target variables represented by the water level in the reservoir and the outgoing flow of the water.
All the variables were ordered based on time, so I was facing a time series problem.
They wanted a 7-days ahead prediction of the water level and its outgoing flow since knowing them could drastically help to properly organize the reservoir management.
They could save water for dry seasons and provide it to farmers in times of need.
This application is evidence of the help a Data Scientist could offer to companies.
Knowing in advance what will happen could save resources and manage the best organization.
The approach I followed is split into two methodologies. On one side I faced the problem with ARIMA, a statistical model used in time series forecasting.
On the other, I processed the dataset to make it feasible for the machine learning algorithms.
I experimented with Support Vector Machine and Random Forest among the classical methods.
I tested also a recurrent neural network model based on a gated system called LSTM.
The same procedure was followed for both the water level and the outgoing flow.
The models were compared with RMSE applying a Rolling Cross Validation to keep the time order which was crucial since I was working on time series.
The Cross Validation allowed me to use the “One Standard Error Rule” to select the best models.
The major and crucial step in the analysis was the application of Feature Engineering techniques.
Since the dataset presented just eight variables and since I noticed the relation between them was poor the transformation and combination of the features introduced new variables able to describe the behavior of the two target attributes.