This project will contain all of the code required for my data analytics project.
It will also contain any extra work that was required such as presentation materials and final report.
The data required to run this project has been left out at this point in time due to how large it is.
There are a number of special values throughout the project that will be marked with the phrase ##special_key_value## for ease of searching
A number of scripts are using a different format to lay out their work. This normally results in a much of the speech being marked as description.
Need to be more refinement in how the sections are created to be analysed.
One future project should look at identifying the level of indentation for each line in the script. This can be used as a marker for different sections.
Secondary issue where script format as lines between char name and speech section. This results in orphaned speech sections. These are then marked as descriptions. Soluiton to this problem is to join these secitons together. This issue is also present on page breaks in the way there are presented in a document of continous text.