Word Frequency Analysis: Forensic Editing - Yes, This Editor is a Computer Geek.

Mar 23, 2013 22:15

Was Shakespeare really Francis Bacon? Or perhaps William Shakespeare was the name Christopher Marlowe used to publish his plays after secretly escaping death and fleeing into exile. These are both serious possibilities being studied by literature scholars(*) using computers and stylometric techniques like word frequency analysis.

Now I'm thinking that your thinking: 'Yes, that's all very interesting, but what has it got to do with editing?'.

When editing, I much prefer working with electronic versions of manuscripts than hardcopy because they allow me to use one of my favourite computer utilities, the word frequency analysis tool.

'What's a word frequency analysis tool?' I hear you ask.

The quick answer is that: it is a tool that lets one see the frequency of use of specific words in a text or corpus. However, that really doesn't indicate the full power of this text analysis technique.

In fact, word frequency analysis is used quite often when dealing with electronic texts. It is one of the metrics used in the rating system for your Internet search engine of choice: Google, Bing, whatever. It can be used to determine the relative writing age and education of writers. And, as mentioned above, stylometry-which word frequency analysis is an integral part of-can be used to identify an author's unique writing style; word preference and phraseology can be as unique as a fingerprint. So, as well as being used to establish the true authorship of texts, it has also been used in criminal cases as supporting evidence to convict criminals.

Part of an editor's (and proofreader's) job is to spot the simple mistakes that authors can make through overfamiliarity with their text; they read what they believe should be on the page rather than what is actually in the text. However, being fallible humans, editors can sometimes miss the obvious as well. Especially when they are rereading a text (that overfamiliarity problem again). This is where the power of the computer, and the difference of perspective that a word frequency table gives, becomes of real use.

The image below is of some word frequency tables from the text analysis package built into my writing package of choice, Scrivener. The image has been specially constructed to show the two main writing problems that I use word frequency analysis tables to help me with while editing and proofing larger texts.



Hopefully, simple typos have been picked up by the writer with their spellchecker, but what spellcheckers can sometimes miss are variant spellings of words such as US versus British spelling. Or, the more likely, variant spellings of proper nouns such as character names. Many a time I've read manuscripts where the author has changed the spelling of a character's name and not updated all occurrences of that name in the text. The word frequency analysis table, when sorted alphabetically, shows all these variant spellings close together allowing for the easy identification of this problem.

The other main benefit of a word frequency analysis table for me, both as a writer and editor, is brought to light when the list is sorted by the number of occurrences. Possible overuse of particular words becomes readily apparent in this view and the editor (or an author concerned about their own style) can then attempt to nip this lack of imagination in the bud by judicious rewriting and use of synonyms.

Now, I am not saying that a good editor wouldn't pick these problems up anyway, but overfamiliarity with a text can make anyone blind to certain types of writing problems. Word frequency analysis tables change the perspective of the text so greatly that this familiarity problem goes away and the unerring pattern-matching power of the computer becomes useful in highlighting aspects of a text which may be hard to spot when one is considering just the flow of words or the semantics of the writing.

Anyway, for those who have enough of the inner geek to have read this far, there are online text analysis sites which you can use to generate word frequency analysis tables of your own work (e.g. http://www.csgnetwork.com/documentanalystcalc.html) as well as quite a few stand-alone applications for desktop computers that can be found online with a bit of judicious googling.

Release your inner geek and have a look. Remember, your computer is your friend (and a powerful tool).

* Go to http://cs.brown.edu/research/pubs/theses/masters/2012/ehmoda.pdf to see a representative academic paper on this stylometric-based research.

Phill Berrie

This article originally appeared in March 2013 issue of ACTWrite, the monthly magazine of the ACT Writers Centre. Reprinted with permission.

editing, word frequency analysis, stylometrics

Previous post Next post
Up