Implementation of Term Frequency-Inverse Document Frequency (TF-IDF) and K-Means Clustering for User Experience Research Startup

Muhamad Farid Kharismawan; Hidayatulah Himawan; Mangaras Yanu Florestianto

Implementation of Term Frequency-Inverse Document Frequency (TF-IDF) and K-Means Clustering for User Experience Research Startup

Authors

Muhamad Farid Kharismawan
Hidayatulah Himawan
Mangaras Yanu Florestianto

Abstract

A startup is an organization formed to look for repeatable and scalable business models. In recent years, startups have experienced significant growth. However, of the many startups in the world, most have failed. Factors related to the user is a factor that is very influential in startup failure. Therefore, a solution is needed to overcome problems related to these users. One solution is to do user experience research. In this study, the data used came from application reviews on the Google Playstore. To be able to process this data, the system implements the TF-IDF algorithm and K-Means Clustering. This research is expected to produce a system that functions to carry out user experience research automatically. So that it can be a solution to startup problems related to users and in the end can reduce the percentage of failed startups. From the Oy! app review data In Indonesia, there were 2,865 reviews that were implemented into the system using the K-Means Clustering algorithm, four topics that users often complain about, including the topic of credit exchange of 1,554 reviews, the redeem point feature of 172 reviews, the pulse redeem feature of 183 reviews, and the bank transfer feature of 541 reviews.

Published

2026-01-22

Issue

Vol. 3 No. 1: December 2023

Section

Articles

License

License and Copyright Agreement

In submitting the manuscript to the journal, the authors certify that:

They are authorized by their co-authors to enter into these arrangements.
The work described has not been formally published before, except in the form of an abstract or as part of a published lecture, review, thesis, or overlay journal. Please also carefully read SITech's Posting Your Article Policy.
That it is not under consideration for publication elsewhere,
That its publication has been approved by all the author(s) and by the responsible authorities – tacitly or explicitly – of the institutes where the work has been carried out.
They secure the right to reproduce any material that has already been published or copyrighted elsewhere.
They agree to the following license and copyright agreement.

Copyright

Authors who publish with Computing and Information Processing Letters agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

Licensing for Data Publication

Science in Information Technology Letters use a variety of waivers and licenses, that are specifically designed for and appropriate for the treatment of data:

Open Data Commons Attribution License, http://www.opendatacommons.org/licenses/by/1.0/ (default)
Creative Commons CC-Zero Waiver, http://creativecommons.org/publicdomain/zero/1.0/
Open Data Commons Public Domain Dedication and Licence, http://www.opendatacommons.org/licenses/pddl/1-0/

Other data publishing licenses may be allowed as exceptions (subject to approval by the editor on a case-by-case basis) and should be justified with a written statement from the author, which will be published with the article.

Open Data and Software Publishing and Sharing

The journal strives to maximize the replicability of the research published in it. Authors are thus required to share all data, code, or protocols underlying the research reported in their articles. Exceptions are permitted but have to be justified in a written public statement accompanying the article.

Datasets and software should be deposited and permanently archived in appropriate, trusted, general, or domain-specific repositories (please consult http://service.re3data.org and/or software repositories such as GitHub, GitLab, Bioinformatics.org, or equivalent). The associated persistent identifiers (e.g., DOI or others) of the dataset(s) must be included in the data or software resources section of the article. Reference(s) to datasets and software should also be included in the article's reference list with DOIs (where available). Where no domain-specific data repository exists, authors should deposit their datasets in a general repository such as ZENODO, Dryad, Dataverse, or others.

Small data may also be published as data files or packages supplementary to a research article; however, the authors should prefer a deposition in data repositories in all cases.

Implementation of Term Frequency-Inverse Document Frequency (TF-IDF) and K-Means Clustering for User Experience Research Startup

Authors

Abstract

Published

Issue

Section

License

Information

Developed By