Afaan Oromo Document Text classification using Single layer Multisize Filters Convolutional Neural Network
DOI:
https://doi.org/10.20372/hjet.v1i1.34Keywords:
Text classification, Document text classification, Afaan Oromo, Convolutional neural networkAbstract
Text classification is one of the most widely used natural language processing technologies. It is the technique that classifies unstructured text data into meaningful categorical classes. With the continuously increasing amount of online information, there is a pressing need to classify text for valuable information. Previously, many researchers have done Afaan Oromo text classification using machine learning methods. However, most of these traditional methods use TF-IDF, a Bag of words, to map some representation of the input data to a predefined set of meaningful outputs. However, these methods ignore the context and internal hierarchy of the text. In addition, they treat labels as independent individuals while ignoring the relationships between them, which also leads to a significant loss of semantic information; these deep learning approaches can solve these limitations. So, in this study, we use a Single layer Multi-Size Filters Convolutional Neural Network for document text classification and collect a dataset that contains 6450 documents organized into ten classes. We also look at how preprocessing approaches affect the performance of the model. In conclusion, after hyperparameter tuning the model, the performance of SMF-CNN was evaluated using Fast-Text pre-trained and Word2vec pre-trained word embedding, as well as without using pre-trained word embedding. The experimental results show Single-layer Multi-Size Filters Convolutional Neural Network performance can achieve effectiveness and good scalability of the accuracy of 96.81%, with Fast-Text pre-trained word embedding.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Harla Journals and Author
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.