This last post is simply one to mention some last thoughts but also talk about how the work done in this project could be expanded upon, refined, and made into a deeper and more thorough analysis. To do this I have compiled a list of ideas on how this could be done going forward:
1. Cataloging ALL Repetitions
The obvious first way to better characterize The Sonnets through this approach would be to thoroughly catalog every single repetition in the book, and then use that data for this analysis approach. It is certainly doable, I just was not able to take my analysis this far due to time constraints on the project
2. Utilizing Proper Data Analysis Techniques
Like I've said before, I am by no means a statistical expert or data analyst and thus there could possibly be more efficient dimensionality reduction and analysis algorithms that I do not know about that could be applied here to The Sonnets. One example of this that I was thinking about implementing but wasn't able to is use Natural Language Processing and something such as word2vec, a model to produce word embeddings and create numerical relationships between words, to allow the analysis to go one level deeper in comparing sonnets. That is, not only comparing which words and phrases are repeated, but then comparing the level of semantic similarity between these words. Thus creating an analysis of the syntax and semantic of The Sonnets and seeing what new results can be achieved from that.
There are also most likely better ways to represent the data than I did. There were some relationships in my data that I'm not sure belonged either due to how I represented the data or how I chose to process it. Regardless, someone who is more experienced with these techniques could perhaps provide an improvement to either of these aspects.
3. Code Optimization
The code written for this project was by no means optimal, and there are most likely more time efficient ways to extract higher quality information from the data. A good example of this is the algorithm I used to find cycles. There are many ways this algorithm needs to be fixed from an efficiency and operational standpoint. It does become very costly to run when looking for larger cycles, and it can also results in cycles that repeat sonnets, which is not wanted.
4. Deeper Literary Analysis
While this one may be obvious, and is partially the purpose of this project, perhaps the greatest way this project can be expanded is for someone to apply it a full literary analysis of The Sonnets. The main goal of the project was to provide tools that could be used to explore the structure of The Sonnets, and perhaps start to embark on that exploration. It would be amazing to see someone take these tools and apply them to their full potential, picking apart The Sonnets and squeezing out every last detail that this approach can help uncover.
Comments