Summary
Readers and publishers are understandably fascinated with trying to figure out what makes certain novels popular and thus, more likely to be read and sold. This project is meant to figure out if there’s a correlation between the popularity of a book and the number of independently occurring words in it via examining the number of words that occur only once in “classics” which have been read and rated highly by users from multiple demographics. A key part of this project is using data-sets of similar word to avoid counting words as independently occurring if they are used in a different tense or in their adverbial or adjectival forms.