A COMPARATIVE REVIEW OF CLUSTERING AND CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYTICS
DOI:
https://doi.org/10.70429/sjis.v3i1.179Keywords:
Big Data, Data Mining, Clustering, Classification, Machine LearningAbstract
These days, there's so much data being created all the time. It’s honestly getting hard to keep up.
That’s where data mining comes in. Basically, people use it to make sense of all this huge amount of
information, and there are two main ways to do it: clustering and classification. I found that there are
a bunch of algorithms for both, like K-Means, DBSCAN, and Hierarchical Clustering for clustering,
and then there’s Decision Tree, Naïve Bayes, SVM, and Random Forest for classification. Each of
these has its own strengths and weaknesses depending on the data you’re working with. The point of
this paper was really to see how these algorithms perform and to give people an idea of which one
might work best depending on the situation. What we found is that no algorithm is perfect for
everything. So, choosing the right one really comes down to understanding the data and figuring out
what you're trying to get out of it.