Computational linguistics? No! Computational composition

Sep 05, 2006 13:47

A part of my graduate coursework has been on computational linguistics. Loosely defined, this can cover information extraction, language translation, speech recognition, language generation, etc. Pretty much anything that involves a computer. One research method is to look for n-grams. One of the research methods useful for text-based problems is to create n-grams, where n is a variable for any integer and gram refers to string of words. Then you look at the statistics and probabilities of n-grams. As an example, let's pretend that you're using your personal journal as the text, and your looking for 3-grams. The algorithm then breaks down your journal into sets of three, consecutive words and looks at the probability of finding those 3-grams. For your journal, you'll probably find that the 3-gram "I am going" is quite frequent, whereas the 3-gram "Costco yesterday, really" might only have one occurrence.

You can create novel sentences by overlaying highly probable n-grams. I went to the store yesterday. = (begin sentence) I went + I went to + went to the + to the store + the store yesterday + store yesterday (end sentence)

Using n-grams to create novel sentences is like teaching a computer to communicate by mimicking, statistically, other human speech. Some linguists claim that this research methodology is ridiculous and of little value because the computer is creating language in a manner that (linguists claim) isn't anything like how humans produce language. Their reasoning is if language is uniquely human, then there is one, unique mechanism by which it is produced.

That said, I'm quite pleased that it's not just language that computers can strip down into patterns and then mimic, convincingly. They can also mimic classical composers.

linguistics, news

Previous post Next post
Up