Analysis of Earthquake Time Series Data to Predict Future EarthquakeEvents Using Different ApproachesBellario, M., Elashri, M., Fuad, N.Department of MathematicsDepartment of Physics and AstronomyUniversity of Minnesota, DuluthIntroductionIn this project, we have two goals. First, we are going toinvestigate the special type of datasets called time seriesdata. This type of data is specified with time order, so theorder of data points represent matters and affect the infor-mation content. Usually, time series data measurements oc-cur based on a regular basis in time. Some examples of thistype of data could be data about how stocks represent thefluctuating nature of the stock market. We can give thisdefinition of time series data as an ordered sequence of val-ues of a variable at equally spaced time intervals. The areaof our interest in this part will be the classification problemof time series data. Using machine learning techniques onthis type of data provides us with a powerful machine/tech-nique to deal with the exponential increase in data for bet-ter time and efficiency. Time series classification problemis different from regular classification because the data at-tributes have an ordered sequence. We are going to verifythe result obtained on earthquakes datasets that claimed tohave the best accuracy performance using rotational ran-dom forest algorithm. We are going to implement the algo-rithm and compare the result with the published result.Also, we are going to use Neural networks on our dataset.Neural network is a branch of machine learning that hasgained importance in the last two decades. Development inalgorithms to build a neural network is continuous eachday. Last year there was a paper published about replacingthe neural network standard residual method by ordinarydifferential equation method which is called neural ordi-nary differential equation. [4] This new algorithm is not yetwell tested for classification problems and our second goalis to test how well is it in terms of prediction accuracy andthe time performance and memory usage which can be acrucial factor for building models for big datasets.Our hypothesis is that Neural ordinary differential equa-tions are a better algorithm for classification problems andshould give us improvements in the accuracy of our predic-tion on test datasets. It is also faster in terms of computingtime and is lighter in memory usage. Related WorkIn a paper in 2016, A[1]. Bagnall et al. applied different ma-chine learning algorithms to find out which one is the best.They found that there is no single best algorithm for everydataset, rather the efficiency of an algorithm depends onthe dataset. They used Rotation forest (RotF), Support Vec-tor Machine (SVM), Neural network (NN) etc. to reach totheir conclusions. One of the dataset they used was ‘Earth-quake dataset’[3]. They showed that RotF is the best algo-rithm for this dataset with an accuracy of 75.92%. In an-other paper[2] in 2018, they claimed that RotF is probablythe best algorithm for classification problem with continu-ous features. But in a paper[4] in 2018, a group fromToronto came up with a new idea called ‘Neural Differen-tial Equation Method’ to solve the machine learning prob-lems. Their main idea was that the basic equation forResidual network is similar to the discrete equation ofsolving differential equation using Euler method, so we canreplace a neural network with a differential equation. Theyshowed that in general, this Neural ODE method worksaround 6 times better than the other algorithms. A. Bagnallet al. did not apply this to the datasets he worked on. Weintend to apply this to the Earthquake dataset to verify theirclaims.Proposed WorkIn this project, we propose to implement machine learningalgorithms to predict the future earthquakes using theEarthquake dataset mentioned earlier. We intend to do thisusing three different algorithms:Rotation Forest (RotF)Neural NetworkNeural Differential EquationWe want to implement the Rotational Forest and NeuralDifferential Equation method from scratch and implement
Neural network method using built in function of Kerasmodule in Python. Our first goal is to reproduce the sameresult as in the paper using RotF and Neural Networkmethod. Then we are going to implement previously unim-plemented method of Neural Differential Equation methodon the same data. Doing this in three different methods willgive us confidence in our result as well as help us to com-pare the different method and sub-sequentially verify theworks of the previous papers mentioned in a small way.While comparing we will focus on three parametersmainly:Accuracy: Which model gives the best accuracyon test data.Time Complexity: Which model takes lesser timeto train on the training dataset.Memory Complexity: Which model takes lesserspace in memory while training.So, compactly, objectives of this project are:Learning how different Machine Learning algo-rithms (in this case RotF, Neural Network andNeural Differential Equation method) work.Reproducing the result from the author of thedatasets that we have already foundComparing different ML algorithms in terms ofaccuracy, time and space complexity.Experimental EvaluationDatasetThe dataset we are working on, the Earthquake data, istaken from Northern California Earthquake Data Centerand donated by A. Bagnall. This is a time-series datasetwhere each data is a reading of Richter scale taken eachhour since December 3, 1967 to 2003 at Northern Califor-nia Earthquake Data Centre. Then to transform this datasetinto a classification problem, A. Bagnall et al. first definesa major event as any reading of over 5 on the Richter scale.Major events are often followed by aftershocks. Thephysics of these are well understood. Hence they consid-ered a positive case to be one where a major event is notpreceded by another major event for at least 512 hours. Toconstruct a negative case, they considered instances wherethere is a reading below 4 (to avoid blurring of the bound-aries between major and non major events) that is precededby at least 20 readings in the previous 512 hours that arenon-zero (to avoid trivial negative cases). None of thecases overlap in time (i.e. we perform a segmentationrather than use a sliding window). Of the 86,066 hourlyreadings, they produce 368 negative cases and 93 positive.To have an idea about the dataset, we can have a look atthe dataset in the following way:ClassAttr. 1Attr. 2...Attr. 51211-0.518-0.518...-0.518201.354-0.353...-0.353302.639-0.316...-0.316..................3220-0.484-0.484...-0.484So we can see that in each row there are 513 columns. Firstcolumn shows the classification with 1 being the positivecases and 0 being the negative cases. Next 512 columns arethe attributes that give that classification. There are 322 ofthose. And there are also 139 of those data-rows for us totest the model that we build. However, at first, we plottedthe dataset to find out if there is any visible trend. In Fig 1we show the first few data using red/triangle sign as nega-tive case and green/solid circle sign as positive case. As itturns out there is no visible trend and this solidifies the factthat the only way to go on with this data is using machinelearning.AlgorithmWe are going to use multiple ML algorithms in this project,namely Rotation Forest (RotF), Neural Network and Neu-ral Differential Equation methods. These are explainedvery briefly here:1. Rotation Forest: Rotation forest is a very powerful al-gorithm for classification problems in ML. This is actuallycomprised of two different algorithms- PCA and randomforest. In this algorithm, PCA is applied first on the dataand then that dataset is classified using random forest algo-rithm.Fig 1: Visualizing earthquake data
-PCA: PCA is Principal Component Analysis. It decreasesthe dimension of data. Its purpose is to prevent over-fittingof data and to improve the accuracy of classification algo-rithm that will be implemented later. -Random Forest: Random forest is ensemble type classi-fier. In this algorithm, some specific number of featuresand data are chosen randomly and they are classified usingdecision tree. And after that those decision trees are used toget classification result. Final result is found using taking‘vote’ from those trees.2. Neural Network: Neural network is a very powerfultool in machine learning. The basic idea of this is that thisalgorithm assigns some weight to the attributes by mini-mizing a loss function and finally use an activation func-tion (sigmoid, relu, tanh etc) to make the final decision. Inaddition to that, it may contain some hidden layers too be-fore making that final decision.3. Neural Differential Equation Method: Neural Differ-ential equation method is a brand new approach to solveML problems. It is observed that the residual networkequation ht+1=ht+F(ht) can be represented as special case ofdiscrete Euler ordinary differential equation method for-mula: ht+1=ht+eF(ht) with e=1. So, if we consider a layer ofour neural network to be doing a step of Euler’s method,then we can model our system by the differential equation:dh(t)/dt = F(h(t),x,t), where x is our training parameter andt is the time dependence.Implementation of algorithms1. Rotation ForestWe begin investigating our data using rotation forest withthe earthquake data. The rotation forest algorithm was im-plemented in python. The built in function of sklearnpackage for decision tree was used, but everything else wasimplemented from scratch. So, the algorithm in brief was:PCA on training data → Bootstrapping → Making a forestof random decision trees → Making final decision from the‘votes’ of the trees → Making prediction using test data.With the increase of number of trees, the accuracy of thealgorithm did not increase linearly as seen from the Fig 2.Accuracy dropped quite a bit for some number of trees, butthe general trend of accuracy was increasing. And afteraround 20 trees, the accuracy became the highest andstayed the same. And it did not vary even a little with orwithout PCA. At this point, the confusion matrix was:Predicted YESPredicted NOActual YES10435Actual NO00So, the highest accuracy acquired by this rotation forestwas 74.82%.The aforementioned paper indicates that they found75.92% accuracy for prediction using this algorithm for theearthquake dataset. In order to compare our implementa-tion, we also applied the built in random forest functionfrom sklearn on our dataset. Fig 3 shows the relation be-tween accuracy and number of trees. We were able to ob-tain 76.28% accuracy at maximum using this and the aver-age accuracy is 74.32% which is slightly less than the pa-per claimed.Fig 2: Accuracy of rotation forest algorithm on earthquakedataFig 3: Accuracy of rotation forest using built in function in sklearn
Fig 4 shows the relation between time needed with numberof trees. It shows that for both with and without PCA, al-though elapsed time sometimes increases or decreases, thegeneral trend is that it needs more time with more trees,which was expected. But interestingly, it takes it takeslesser time for relatively larger number of trees if the datais transformed with PCA at first. Fig 3 showed that at least20 trees are needed to get maximum accuracy in thismethod. And Fig 4 shows that in the case of using 20 trees,time needed to get to accuracies are 0.028 seconds and0.032 seconds respectively for with or without PCA. Thismeans that although Random forest and Rotation forest arealmost similar in terms of accuracies, but rotation forest isslightly (~12.5%) better than random forest in terms oftime consumed.2. Neural NetworkNeural network is one of the most used ML algorithm. Inour case, we did not implement it from scratch, rather weused built-in functions from keras to implement this on ourdata. We used 3 hidden layers with 12,20 and 1 neurons se-quentially. Then we ran the earthquake data for 40 epochswith batch size=10. We found that the accuracy and/or lossgets saturated after around 25 epochs as shown in Fig 5.This The average accuracy and loss training was 99.69%and 1.27%. Then we used the network to make predictionon test data and got the confusion matrix:Predicted YESPredicted NOActual YES9212Actual NO2510This confusion matrix gives us the accuracy of the networkto be 73.38%. This accuracy is much lesser compared tothe accuracy from Rotf algorithm. It is to be mentionedthat Relu has been used as the activation function for thefirst two layers whereas sigmoid is used the activationfunction for the last layer. And the loss function used was‘binary cross-entropy’.Fig 6 shows the time needed to find the accuracy (withoutplotting anything) for different number of epochs used.Elapsed time is increased smoothly as expected. But onething worth mentioning here is elapsed time is not exactlylinear, rather ‘slightly quadratic’. From the previous fig-ure, it is seen that we start to get maximum accuracy if wetrain for 19 epochs and Fig 6 says that time needed for thatis 113.69 seconds. Neural network is also applied for othertwo famous dataset named MNIST and CIFAR10 datasetbecause of comparison purpose. The result for that isshown in the Appendix A.Fig 3: Time needed in RotF for earthquake dataFig 4: Loss/accuracy in training using Neural network on earthquake dataFig 5: Loss/accuracy in training using Neural network on earthquake dataFig 6: Time needed for different no of epochs in Neural Network implementation on earthquake dataFig 4: Time elapsed for RotF on Earthquake data
3. Neural Differential EquationWe tried to implement Neural Differential Equationmethod on our dataset. We used python anaconda frame-work and used Euler method as our differential equationsolving method. But unfortunately we were not able to im-plement that on our data although we were able to imple-ment the same method on two very famous dataset namedMNIST and CIFAR10 dataset. The results from these twoare shown in the Appendix B. Comparison of AlgorithmsThis comparison is mostly comparison among the codesthat run on a specific machine, rather than actual rigorousand theoretical comparison of algorithm. We compare thealgorithms in terms of accuracy and runtime.1. Comparison among random forest, rotation for-est and neural network on earthquake dataRandom forest, rotation forest and the neural network wasapplied on the earthquake dataset. Fig 7 shows the compar-ison of their accuracies and runtime on a specific machine.In case of random/rotation forest, it took 19 trees to get totheir highest accuracy. So, time to build 19 trees was taken.In case of Neural network, it took around 20 epochs oftraining. So time for 20 epochs of training was taken forthat. From Fig 7, it is clear that Rotation forest is theslightly better all other algorithms on earthquake dataset interms of accuracy. And in terms of runtime, Neural net-work is severely worse than the other two and the rotationforest is again the best algorithm. In fact the random/rota-tion forest is so much better than neural network(0.032s/0.028s compared to ~113s) that the column forthose two cannot even be seen in Fig 7.2. Comparison between Neural network and Neu-ral Differential Equation on MNIST and CIFAR10DataWe could not get to the point of implementation of neuraldifferential equation method on the earthquake data. So fornow we compare that method with neural network usingMNIST and CIFAR10 data. Comparison between Neuralnetwork and Neural differential equation method for MN-SIT and CIFAR10 data for accuracy and time are shown inthe Fig 8 and Fig 9. Fig 8 and Fig 9 shows that Neural differential equationmethod did not work as good as expected either in terms ofaccuracy of runtime. For example, the former took around9 times and 5 times more time than the later for MNSITand CIFAR10 dataset respectively.Fig 7: Comparison of random forest, rotation forest and neural network on earthquake dataFig 8: Comparison between Neural network and Neural differential equation for MNIST dataFig 9: Comparison between Neural network and Neural differential equation for CIFAR10 data
ConclusionThe main goal of the project has been to use some earth-quake data as a machine learning classification problem topredict the future earthquake events based on recentRichter scale reading. Since this is done using three differ-ent algorithms, RotF, Neural network, and neural differen-tial equation method, the methods are also comapred interms of accuracy and time. We implemented RotF algo-rithm from scratch and it worked as good as the paper wetried to reproduce having around 75% accuracy instead of75.92%. Neural network was implemented using built-infunctions in keras and found to be 73.38% accurate. So, itis still debatable how reliable the existing machine learningmodels are to predict future earthquake. The brand new ap-proach in this sector, neural differential equation method,has not been implemented on earthquake dataset yet, butfor the sake of comparison it is implemented on two fa-mous dataset: MNIST and CIFAR10. In these two cases, itis surprisingly found that neural network works better thanneural differential equation method in terms of both accu-racy and time. But the comparison of neither time nor ac-curacy is done between neural network and Rotf on theearthquake dataset yet. This is one of the future goal that isneeded to be fulfilled. Also the neural differential methodis also needed to be implemented on the earthquakedataset. This will make it a little bit easier to compare allthree algorithms in a more structured way. But from the ex-isting analysis, it is seen that RotF is the best algorithmamong RotF, Random forest and Neural network on theearthquake data. And on both MNIST and CIFAR10 data,the Neural network is better than the Neural differentialequation method. So it can probably be safely assumed thatthe Rotation forest is the best algorithm in terms of bothaccuracy and runtime for the earthquake data.References[1] Bagnall, A., Lines, J., Bostrom, A., Large, J. & Keogh, E(2016), The great time series classification bake off: a review andexperimental evaluation of recent algorithmic advances. DataMining And Knowledge Discovery, 31(3), 606-660[2] Bagnall, A., et al., Is rotation forest the best classifier for theproblems with continuous features? , (2018)arxiv.org/abs/1809.06705[3] Time Series Classification Website. (2019). Retrieved fromhttp://www.timeseriesclassification.com/description.php?Dataset=Earthquakes[4] Chen, T.Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D.K.(2018). Nueral Ordinary Differential Equations, NeuralPS.AppendixA. Implementation of Neural network on MNIST and CIFAR10 datasetWe also used neural network on two different dataset forlater comparison. The first dataset is called MNIST (hand-writing recognition dataset) which is one of the most basicdata used in machine learning. The other one is CIFAR10which is a dataset consisted of images that need to be clas-sified into 10 different categories. In Fig 10 and Fig 11 wesee the Accuracy and Loss versus the number of epochs us-ing residual neural networks for MNIST dataset. Fig 12 and Fig 13 also give the same information but forCIFAR10 dataset. MNSIT dataset takes about 2348 sec-onds on average to be trained and CIFAR10 takesabout 3748 seconds.Fig 10: Training accuracy for MNIST handwriting dataFig 11: Training loss for MNIST handwriting data
We can observe the pattern that after a small number ofepochs the loss and accuracy of our models don't improvemuch and this is good in terms of the time and computingpower. This behavior is the same for both of the datasetsand agreed with the recorded time to train the model. Theaccuracy of neural network on MNSIT and CIFAR10 datawere 98.6% and 85.9% respectively (Number of graphscould probably be reduced in this section with combiningfew. This will be taken care of in the final submission.)B. Implementation of Neural ordinary differential equation on MNIST and CIFAR10 datasetWe have implemented Neural ordinary differential equa-tions using python anaconda framework. We have used theEuler method as our differential equation solving method.We still have not been able to implement this method onour dataset, but we have tested our implementation usingtwo of the standard datasets that are suitable for such com-parison tests, as mentioned in the previous section. Theloss and accuracy plot are also extracted using Neural Or-dinary differential equation and Fig 14 and Fig 15 give theloss versus the number of epochs for MNIST and CIFAR10data respectively. This neural differential equation method gave accuracy of97.2% and 79.4% for MNIST and CIFAR10 data respec-tively, which is, quite shockingly, less than the accuracy inneural network method! Moreover this method appeared tobe quite slow, since 10 epochs of MNIST and CIFAR10data took 6 hrs and 5 hours to train, respectively, in ourmachine.Fig 12: Training accuracy for CIFAR10 dataFig 13: Training loss of CIFAR10 data