Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published August 15, 2022 | public
Journal Article

Multiple regression techniques for modelling dates of first performances of Shakespeare-era plays

Abstract

The creation of new computational methods to provide fresh insights on literary styles is a hot topic of research. There are particular challenges when the number of samples is small in comparison with the number of variables. One problem of interest to literary historians is the date of the first performance of a play of Shakespeare's time. Currently this must usually be guessed with reference to multiple indirect external sources, or to some aspect of the content or style of the play. This paper highlights a dating technique with a wider potential, using this particular problem as a case study. In this contribution, we introduce a novel dataset of Shakespeare-era plays (181 plays from the period 1585–1610), annotated by the best-guess dates for them from a standard reference work as metadata. We introduce a memetic algorithm-based Continued Fraction Regression (CFR) which delivered models using a small number of variables, leading to an interpretable model and reduced dimensionality, applied for the first time here in a problem of computational stylistics. Our independent variables are the probabilities of occurrences of individual words in each one of the plays. We studied the performance of 11 widely used regression methods to predict the dates of the plays at an 80/20 training/test split. An in-depth analysis of the most commonly occurring 20 words in the CFR models in 100 independent runs helps explain the trends in linguistic and stylistic terms. The use of the CFR has helped us to reveal an interesting mathematical model that links the variation in the use of the words through time, which helps to provide estimates of the dates of plays of the Shakespeare-era. We check for genre effects as a possible confounding variable.

Additional Information

© 2022 Elsevier. Received 18 April 2021, Revised 9 December 2021, Accepted 12 March 2022, Available online 22 March 2022. Funding: This work was supported by the Australian Government through the Australian Research Council's Discovery Projects funding scheme (projects DP160101527, DP200102364). P.M. acknowledges a generous donation from the Maitland Cancer Appeal. This work has been supported by the University of Newcastle and Caltech Summer Undergraduate Research Fellowships (SURF) program. In particular, SURF Fellows J. Sloan and K. Huang acknowledge the support of Samuel P. and Frances Krown and Arthur R. Adams , respectively, for their generous donor support to their activities. CRediT authorship contribution statement: Pablo Moscato: Conceptualization, Methodology, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Supervision, Project administration, Funding acquisition. Hugh Craig: Conceptualization, Methodology, Data curation, Writing – original draft, Writing – review & editing, Funding acquisition. Gabriel Egan: Data curation, Writing – review & editing. Mohammad Nazmul Haque: Methodology, Software, Validation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Visualization. Kevin Huang: Methodology, Software, Validation, Formal analysis, Investigation, Writing – review & editing. Julia Sloan: Methodology, Software, Validation, Formal analysis, Investigation, Writing – review & editing. Jonathon Corrales de Oliveira: Methodology, Software, Validation, Formal analysis, Investigation, Writing – review & editing. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional details

Created:
August 22, 2023
Modified:
October 23, 2023