Machine learning techniques used for rainfall prediction

May 28, 2021

Machine learning techniques used for rainfall prediction

Rainfall is predicted through analyzing a location's weather features such as humidity and wind properties. Data mining is used to find patterns within data and information from a set of cluttered and unorganized information. Nowadays, rainfall prediction is done through the use of several machine learning techniques with varying degrees of daily and monthly performance. Five of these techniques are:

C4.5 algorithm
Naïve Bayes
Support-vector machine
Neural network
Random forest

Although, these are all separate techniques, they are often combined to provide precise and accurate predictions of rainfall.

Rainfall prediction model.

C4.5 algorithm

The C4.5 algorithm is used to generate decision trees. This algorithm is used for classification. Each node splits classes based on the provided information. The attribute which has the highest amount of normalized information gain is considered as the splitting criteria. So, for rainfall prediction, the properties temperature, humidity, rainfall, river flow and water level are analyzed and determined as the splitting data or the data with the highest information.

Naïve Bayes

Naïve Bayes is a technique used to construct classifiers. This is family of simple probabilistic classifiers that is based on application of Bayes' theorem with independent and strong assumptions made between the features. Probability for each attribute is calculated and conditional probability for the class value. Once every class value and it's attributes have been calculated, the class with the highest probability is reported.

Support-Vector Machine

Support-vector machines are supervised learning models that analyze data for classification and regression analysis using associated learning algorithms. A hyperplane or set of hyperplanes in high or infinite-dimensional space is constructed by a support-vector machine for the purpose of classification. This divides the classes and gives each of them an independent label. The support-vector machines finds the optimal hyperplane that supports more vectors, is far enough from the data and has higher margins. After these margins are found, data is splitted based on the class whether it is above or under the hyperplane.

Support-vector machine

Neural Network

Neural networks are computing systems that are vaguely inspired by the massive biological neural networks formed inside the brain of animals and humans. A neural network is built upon a collection of connected nodes called artificial neurons, which loosely model the neurons in a biological brain. Each artificial neuron receives a signal, processes it and trasmits signals to neurons connected to it. These systems are self-learning and instead of generally being programmed with specific set of rules for task completion, they learn to perform tasks by considering examples. So, they excel in areas where a feature detection is difficult to specify within a computer program. Neural networks are used in classification, regression, prediction and clustering. There are initial weights for each neuron. The network generates input patterns by every layer until an output pattern is generated by the output layer.

An example of a neural network.

Random Forest

Random forests are an ensemble learning method that consists of multiple machine learning techniques used for classificaton, regression and other tasks. This algorithm operates by constructing a multitude of decision tress during its training time and generating a class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random forest is quite similar to the decision tree with only the ensemble trees acting as the slight difference. The main formula used here is the ensemble of trees that are weak learners can form up to create a random forest, which is the stronger learner.

So which is the best method?

The comparison between the techniques can be made based on precision, recall and F-measure. According to research, the neural network outperforms every other technique because it is precise and the value of recall and F-measure are much higher. While, Naïve Bayes produces the weakest and less accurate results. However, all of the techniques are used together to do an accurate and precise prediction of rainfall.

Search This Blog

IFT 1064 Blog Assessment