Tuesday, October 6, 2009

Natural Language Generation

While going over some nbc's, I stumbled across the AI area of NLP. But, while observing where the areas of nbc and NLP meet, I've found a new obsession: NLG. NLG is an acronym for natural language generation. Natural language generation is text created by a computer program that appears to be human-like in readability.

I first heard about this topic in detail in my AI class in grad school at University of San Francisco. My professor, Dr. Brooks, had mentioned that researchers had been trying for years to create programs that could generate narratives for computer games. I even recall seeing on some news aggregator that someone had successfully won a writing contest with a story written by a NLG system.

At the time I was taking my AI course, I remember working for a horrible boss. Who made all of us who he saw everyday and interacted with on a continuous basis, write weekly reports. I remember wanting to write a Perl or Python script that would do this for me. I made some attempts but it was hard to get any realistic variance. It was essentially an overglorified mad lib, where the program only filled in the blanks.

I was looking for something more natural and human like.

In NLG, one takes data and has generation rules that result in text that feels as if a human wrote it. Surprisingly, if one does a search on NLG, it is a relatively new area of research. Perhaps the best introduction to this topic is on Wikipedia. From the Wikipedia area, you will find yourself on the Bateman and Zock list of Natural Language Generators(http://www.fb10.uni-bremen.de/anglistik/langpro/NLG-table/NLG-table-root.htm)

At the moment, the state of the art appears to be based upon Java and Lisp languages. Since I work in embedded systems where speed and small footprint are key, I'm intersted in implementations that are in C and can scale. I've noticed that most of the NLP and NLG systems I found do not have a database backend. This surprises me since use of a database would allow for scaling and more consistant performacne as the dataset grows.

I think I'll be experimenting with NLG to see if I can make a program that will generate an email that asks a user for info based upon an email inquiry.

No comments: