A Cebuano Parts-of-Speech(POS) Tagger Using Hidden Markov Model(HMM) Applied to News Text Genre

a Computer Science Department, FEU Institute of Technology
Abstract: Part of speech tagging (POS) is crucial in natural language processing, identifying the grammatical categories of words in sentences. This research highlights the lack of focus on POS tagging for Asian languages, particularly Cebuano. Limited research on Cebuano has hindered linguistic documentation and understanding of its grammar and vocabulary. This study introduces a Cebuano POS tagger using the Hidden Markov Model (HMM) to improve Cebuano text processing. The researchers also propose a method for handling unfamiliar words. Results show the algorithm performs well on a news text corpus of 25,000 datasets, with an accuracy of 84 %, precision of 80%, recall of 81.52%, and F1-score of 82%. These outcomes demonstrate the algorithm's effectiveness in addressing language challenges in specific genres. Additionally, the research contributes to the Sustainable Development Goals (SDGs) by promoting linguistic diversity and fostering inclusive language technologies. The study provides insights into Cebuano's linguistic traits and grammatical structures, offering a foundation for further research in natural language processing.