TextCrimes - An analytical database of malicious communications
Current state of development

Although TextCrimes.com is in development the following functions are available for users to access as part of our testing of the software:

  • Within TextCrimes.com users can browse the public collections.
  • Current collections are provided with tag sets which include document tags (e.g. marking the mode of production and primary function of each text) and recipient and sender details where known.
  • Users may also be given further access to view anonymised collections such as CFL case files where appropriate.
Next developments

As TextCrimes.com is being developed part-time by student interns our development cycle is slow but we are actively working on the following areas.

  • Within TextCrimes.com users will be able to contribute their own collections of malicious communications which they can tag using provided or user-defined tag sets.
  • The process for adding these new data Collections will include a quality assurance process to ensure that where texts are digitised this is done accurately from the original image file.
  • Collection owners will be able to choose to publish their data collections to other registered users but have full control over who can view the open or anonymised versions of texts.
  • Collection owners will also be able to allow their data to contribute to background analyses of base-rate information without giving other users permission to read the texts at all.
Filters, searching and analysis
  • Users will be able to create their own query sets of documents by filtering data within or between collections using the tags. For example, a query set might be all documents from available collections that contains conditional threats, by writers known to be men.
  • Once a query set has been defined, users will be able to search the texts of documents by either searching for specified word strings or by using Regex strings. Results of these searches will initially be displayed as Key Word in Context (KWIC) lines.
  • We already have the capacity to tag text strings within texts (e.g. as threats or abuse) and we are looking to develop this "highlight tags" facility further to facilitate more sophisticated searches.
  • A further distant analysis goal will be to allow users to write and share their own scripts using R to run analyses on the query sets.