A Comparative Analysis of Classification Algorithms in Authorship Attribution

Bommideni Revathi, Srinivasu Badugu

PDF

Published: Aug 18, 2021

Bommideni Revathi, Srinivasu Badugu

Abstract

Authorship attribution, the role of identifying the author of a text, has been limited to works of historical importance, but today it is still of great significance. The primary objective of this paper is to lay down the rules for characteristic extraction strategies. Feature extraction and implementation techniques with various classifiers in simple ways so that a move to the attribution of authorship can also operate. With the help of count vectors and term-frequency inverse document frequency(TF-IDF), we presented this paper using three supervised machine learning algorithms such as support vector machine, multinomial naive bayes, and logistic regression. We used the Sklearn library for implementation. The dataset of 3 authors consists of 19579 instances. We split 70 percent of the training dataset, which is 13705 instances, and 30 percent of the test dataset, which is 5874 instances randomly picked and split from the initial dataset. In the Naive Bayes classifier, we have the highest accuracy of 82.09 percent using 24823 vector (vocabulary) size

Issue

Vol. 12 No. 7 (2021)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Article Details