Wednesday, October 14, 2009

New Machine Learning API's to Explore

Today on reddit, someone asked about freely available machine learning API's.

Before the list gets buried, I'm duplicating the contents of that thread here for future exploration:

Weka - Java based ML API
http://www.cs.waikato.ac.nz/ml/weka/

Toolkit for Advanced Disrimnative Modeling
http://tadm.sf.net/

Mallet - Java based ML API
http://mallet.cs.umass.edu/

WekaUT - An extension of Weka that adds clustering
http://www.cs.utexas.edu/users/ml/risc/code/

LibSVM - SVM API
http://www.csie.ntu.edu.tw/~cjlin/libsvm/

SVMlight - A C API for svm
http://svmlight.joachims.org/

C++ API for Neural Networks
http://github.com/bayerj/arac

Torch5 - A Matlab-like ML environment
http://torch5.sourceforge.net/

R - Open source statistical package that can be used for ML
http://cran.r-project.org/web/views/MachineLearning.html

pyML - Python API for ML
http://pyml.sourceforge.net/

Rapidminer - Open Source Data Mining Tool
http://rapid-i.com/wiki/index.php?title=Main_Page

Orange - An Open Source Data Mining Tool (Python and GUI based)
http://www.ailab.si/orange/

Glue - Open Source API for reinforcement learning (Can be used with multiple languages simultaneously)
http://glue.rl-community.org/wiki/Main_Page

Vowpal Rabiit - Learning API from Yahoo Research
http://hunch.net/~vw/

Tuesday, October 13, 2009

FC9 64-bit and VMware Workstation 6.5.3 issue with VMware Tools

I was working with a VMware appliance, which was configured to use the guest OS Fedora Core 9 64-bit.

I was using this guest OS with the latest vesion of VMware Workstation, v6.5.3. I downloaded the latest version of v6.5.3 today and I upgraded the VMware Tools of the FC9 64-bit guest.

It installed but had some kind of problem. After installing the VMware Tools that came with the latest version of VMware Workstation 6.5.3, yum and the 'software updater' were unable to upgrade, remove, and/or download RPM's.

Saturday, October 10, 2009

Unable to find a C or C++ NLG open source tool this week

So, I've been exploring the area of NLG, natural langugae generation. My personal goal was to develop an application that would read a corpus and respond with either a summary of the corpus, or a response to the categories found. In either case, I wanted the summary or response to not just be a template where the noun/verb/adjective/predicates were merely filled in. That's no better than using grep.

As of this week, I can only find API's written in Java, Python, Lisp, and Prolog. Many of the listed NLG API's or applications haven't been touched in years, or are no longer available. Much to my displeasure, nothing in C or C++. I want something that will run lean, mean, and can scale to datasets over a terabyte in size.

Tuesday, October 6, 2009

Natural Language Generation

While going over some nbc's, I stumbled across the AI area of NLP. But, while observing where the areas of nbc and NLP meet, I've found a new obsession: NLG. NLG is an acronym for natural language generation. Natural language generation is text created by a computer program that appears to be human-like in readability.

I first heard about this topic in detail in my AI class in grad school at University of San Francisco. My professor, Dr. Brooks, had mentioned that researchers had been trying for years to create programs that could generate narratives for computer games. I even recall seeing on some news aggregator that someone had successfully won a writing contest with a story written by a NLG system.

At the time I was taking my AI course, I remember working for a horrible boss. Who made all of us who he saw everyday and interacted with on a continuous basis, write weekly reports. I remember wanting to write a Perl or Python script that would do this for me. I made some attempts but it was hard to get any realistic variance. It was essentially an overglorified mad lib, where the program only filled in the blanks.

I was looking for something more natural and human like.

In NLG, one takes data and has generation rules that result in text that feels as if a human wrote it. Surprisingly, if one does a search on NLG, it is a relatively new area of research. Perhaps the best introduction to this topic is on Wikipedia. From the Wikipedia area, you will find yourself on the Bateman and Zock list of Natural Language Generators(http://www.fb10.uni-bremen.de/anglistik/langpro/NLG-table/NLG-table-root.htm)

At the moment, the state of the art appears to be based upon Java and Lisp languages. Since I work in embedded systems where speed and small footprint are key, I'm intersted in implementations that are in C and can scale. I've noticed that most of the NLP and NLG systems I found do not have a database backend. This surprises me since use of a database would allow for scaling and more consistant performacne as the dataset grows.

I think I'll be experimenting with NLG to see if I can make a program that will generate an email that asks a user for info based upon an email inquiry.