This repository hosts the code for a school project on authorship attribution, by Guillaume Dalle, Jasmine Gamblin, Maxime Godin, Clément Mantoux, Wang Sun and Lucile Vigué. We designed a software to answer the question: "Given a set of texts with known authors, can we infer the author of a new document?" The original motivation was to apply our method to a litterary controversy on the authorship of Alexandre Dumas' Les Trois Mousquetaires.
We compared a wide variety of statistical methods combined to a preliminary feature extraction step. We proposed a standardized pipeline to evaluate, interpret and visually represent the output or the classification algorithms. We applied our method to several problems, like classification between novels for children and novels for adults, authorship attribution on a corpus of naturalist French writers, or classification between truth and lies on a data set we gathered.
The final report for this PSC (Collective Scientific Project, in French) can be found here.