Rain Prediction Clustering in Australia Using the K-Means Algorithm in the WEKA and RStudio Application

Dinar Ajeng Kristiyanti, Irwansyah Saputra, Rina Rina


Purpose: The purpose of this study is how to create an ideal cluster in predicting rainfall in Australia based on the percentage of the sum of squares error (SSE) using the K-Means algorithm with WEKA and RStudio applications.
Design/methodology/approach: The method or stages applied in predicting rain in Australia are through several stages including Data Collection, Data Pre-processing (including Missing Value handling in it), Data Mining Modeling by applying the K-Means Clustering algorithm using WEKA and RStudio, Validation results with SSE as well as Data Visualization using plots.
Findings/result: Based on the results obtained, clusters of 2 with an SSE of 28.0% are ideal clusters for predicting rain in Australia. In the WEKA software, rain clusters are represented by blue nodes, and non-rainy clusters are represented by red nodes. While in the RStudio software, rain clusters are represented by black nodes and non-rainy clusters are represented by red nodes.
Originality/value/state of the art: Get the ideal cluster in predicting rainfall in Australia by comparing the results obtained using the WEKA and RStudio applications.


Clustering; K-Means; WEKA; Rstudio; Rain Australia

Full Text:



G. Sethupathi M, Y. S. Ganesh, and M. M. Ali, “Efficient Rainfall Prediction and Analysis using Machine Learning Techniques,” Turkish J. Comput. Math. Educ., vol. 12, no. 6, pp. 3467–3474, 2021.

C. Thirumalai, K. S. Harsha, M. L. Deepak, and K. C. Krishna, “Heuristic prediction of rainfall using machine learning techniques,” in Proceedings - International Conference on Trends in Electronics and Informatics, ICEI 2017, 2018, vol. 2018-Janua, pp. 1114–1117, doi: 10.1109/ICOEI.2017.8300884.

S. Aftab, M. Ahmad, N. Hameed, M. S. Bashir, I. Ali, and Z. Nawaz, “Rainfall prediction in Lahore City using data mining techniques,” in International Journal of Advanced Computer Science and Applications, 2018, vol. 9, no. 4, pp. 254–260, doi: 10.14569/IJACSA.2018.090439.

A. Y. Felix, G. S. S. Vinay, and G. Akhik, “K-Means cluster using rainfall and storm prediction in machine learning technique,” J. Comput. Theor. Nanosci., vol. 16, no. 8, pp. 3265–3269, 2019, doi: 10.1166/jctn.2019.8174.

A. M. Bagirov, A. Mahmood, and A. Barton, “Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach,” Atmos. Res., vol. 188, pp. 20–29, 2017, doi: 10.1016/j.atmosres.2017.01.003.

J. Young, “Rain in Australia,” Kagle.com, 2018. https://www.kaggle.com/jsphyg/weather-dataset-rattle-package.

J. M. Frederic Lardinois, Matthew Lynley, “Google is acquiring data science community Kaggle,” 2017. .

M. Nasution, “Implementasi Data Mining K-Means Untuk Mengukur Kemampuan Logika Mahasiswa (Studi Kasus : Amik Labuhan Batu),” J. Inform., vol. 5, no. 1, pp. 32–37, 2019, doi: 10.36987/informatika.v5i1.667.

P. Cichosz, Data mining algorithms : explained using R. John Wiley & Sons, Inc., 2015.

X. Wu et al., Top 10 algorithms in data mining, vol. 14, no. 1. 2008.

T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and a Y. Wu, “An efficient k-means clustering algorithm: analysis and implementation,” Pattern Anal. Mach. Intell. IEEE Trans., vol. 24, no. 7, pp. 881–892, 2002, doi: 10.1109/TPAMI.2002.1017616.

A. R. Dasari, “Prediction Of Rainfall In India To Increase Agricultural Productivity Implemented In Hadoop Prediction Of Rainfall In India To Increase Agricultural Productivity Implemented In Hadoop,” no. March, 2021.

N. Salehnia, N. Salehnia, H. Ansari, S. Kolsoumi, and M. Bannayan, “Climate data clustering effects on arid and semi-arid rainfed wheat yield: a comparison of artificial intelligence and K-means approaches,” Int. J. Biometeorol., vol. 63, no. 7, pp. 861–872, 2019, doi: 10.1007/s00484-019-01699-w.

Y. Cho, H. Lee, B. Lim, and S. Kim, “Classification of Weather Patterns in the East Asia Region using the K-means Clustering Analysis,” Atmosphere (Basel)., vol. 29, no. 4, pp. 451–461, 2019, doi: 10.14191/ATMOS.2019.29.4.451.

U. Kumar, “Open Access Design and Analysis of Multiclass Classification Models for Rainfall Prediction,” Res. J. Comput. Sci. Inf. Technol., vol. 1, no. 1, pp. 23–34, 2018.

Noname, “Weka - Clustering,” tutorialspoint.com, 2021. https://www.tutorialspoint.com/weka/weka_clustering.htm (accessed Sep. 20, 2021).

W. KENTON, “Sum of Squares,” 2020. .

State of New York, “New York State Index Crimes,” Kagle.com, 2019. https://www.kaggle.com/new-york-state/new-york-state-index-crimes/metadata.


  • There are currently no refbacks.