Open Research Newcastle
Browse

An information theoretic clustering approach for unveiling authorship affinities in Shakespearean era plays and poems

Download (1.79 MB)
journal contribution
posted on 2025-05-10, 10:05 authored by Ahmed Shamsul Arefin, Renato Vimieiro, Ricardo RiverosRicardo Riveros, David CraigDavid Craig, Pablo MoscatoPablo Moscato
In this paper we analyse the word frequency profiles of a set of works from the Shakespearean era to uncover patterns of relationship between them, highlighting the connections within authorial canons. We used a text corpus comprising 256 plays and poems from the 16th and 17th centuries, with 17 works of uncertain authorship. Our clustering approach is based on the Jensen-Shannon divergence and a graph partitioning algorithm, and our results show that authors’ characteristic styles are very powerful factors in explaining the variation of word use, frequently transcending cross-cutting factors like the differences between tragedy and comedy, early and late works, and plays and poems. Our method also provides an empirical guide to the authorship of plays and poems where this is unknown or disputed.

Funding

ARC

DP120102576

DP140104183

History

Journal title

PLoS One

Volume

9

Issue

10

Publisher

Public Library of Science (PLoS)

Place published

San Francisco, CA

Language

  • en, English

College/Research Centre

Faculty of Engineering and Built Environment

School

School of Electrical Engineering and Computer Science

Rights statement

© 2014 Arefin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Usage metrics

    Publications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC