Midnight Engineering

Saturday, December 26, 2009

How to get longitude and latitude from Google Maps

So, I have a new obsession. I've been toying around with Android and creating gps aware applications. I don't own my own Android phone yet. I'm having a hard time deciding if I want a Droid or an Eris. In the meantime, I've been hanging around the Android sites, learning as much as I can about writing Android apps. One of the areas I'm interested in is GPS aware apps, where apps change their behavior based upon the current GPS position.

The Android 2.0 SDK comes with its own Android device simulator. which can simulate an Android device that is changing its position in the real-world. One of the ways it can simulate this behavior is for a user to submit a longitude and latitude. Well, how do you get longitude and latitude coordinates if you don't own a GPS and/or the location you want to simulate is not accessible. Google Maps can be used to provide longitude and latitude coordinates.

First, go to Google Maps and enter the address of interest. The address you just entered will appear on the left-hand side. Move the mouse over the address which appears as a link. When you move your mouse over the link, the contents will appear in the bottom of your browser. In my case, I'm using Firefox. It's possible your browser may not do this. In the link that appears, look for the string that appears as follows:

ssl=X,Y

X will be the longitude. Y will be the latitude.

Tuesday, November 10, 2009

1-liner for creating symbolic links in /usr/local/bin

If you have to create more symbolic links than you can create manually, this is a useful 1-liner. In this case, this 1-liner was used to install the binaries from Java 1.6 into /usr/local/bin:

find /usr/local/jdk1.6.0_17/bin|grep -vE "bin$|ControlPanel$"|perl -ne 'system "ln -s $_";'

This sort of 1-liner is useful when the number of symbolic links to create is greater than 1. It easily scales to 100's and even 100's of symbolic links. With the grep tool, one could even easily filter out items by name. Very useful!

Wednesday, October 14, 2009

New Machine Learning API's to Explore

Today on reddit, someone asked about freely available machine learning API's.

Before the list gets buried, I'm duplicating the contents of that thread here for future exploration:

Weka - Java based ML API
http://www.cs.waikato.ac.nz/ml/weka/

Toolkit for Advanced Disrimnative Modeling
http://tadm.sf.net/

Mallet - Java based ML API
http://mallet.cs.umass.edu/

WekaUT - An extension of Weka that adds clustering
http://www.cs.utexas.edu/users/ml/risc/code/

LibSVM - SVM API
http://www.csie.ntu.edu.tw/~cjlin/libsvm/

SVMlight - A C API for svm
http://svmlight.joachims.org/

C++ API for Neural Networks
http://github.com/bayerj/arac

Torch5 - A Matlab-like ML environment
http://torch5.sourceforge.net/

R - Open source statistical package that can be used for ML
http://cran.r-project.org/web/views/MachineLearning.html

pyML - Python API for ML
http://pyml.sourceforge.net/

Rapidminer - Open Source Data Mining Tool
http://rapid-i.com/wiki/index.php?title=Main_Page

Orange - An Open Source Data Mining Tool (Python and GUI based)
http://www.ailab.si/orange/

Glue - Open Source API for reinforcement learning (Can be used with multiple languages simultaneously)
http://glue.rl-community.org/wiki/Main_Page

Vowpal Rabiit - Learning API from Yahoo Research
http://hunch.net/~vw/

Tuesday, October 13, 2009

FC9 64-bit and VMware Workstation 6.5.3 issue with VMware Tools

I was working with a VMware appliance, which was configured to use the guest OS Fedora Core 9 64-bit.

I was using this guest OS with the latest vesion of VMware Workstation, v6.5.3. I downloaded the latest version of v6.5.3 today and I upgraded the VMware Tools of the FC9 64-bit guest.

It installed but had some kind of problem. After installing the VMware Tools that came with the latest version of VMware Workstation 6.5.3, yum and the 'software updater' were unable to upgrade, remove, and/or download RPM's.

Saturday, October 10, 2009

Unable to find a C or C++ NLG open source tool this week

So, I've been exploring the area of NLG, natural langugae generation. My personal goal was to develop an application that would read a corpus and respond with either a summary of the corpus, or a response to the categories found. In either case, I wanted the summary or response to not just be a template where the noun/verb/adjective/predicates were merely filled in. That's no better than using grep.

As of this week, I can only find API's written in Java, Python, Lisp, and Prolog. Many of the listed NLG API's or applications haven't been touched in years, or are no longer available. Much to my displeasure, nothing in C or C++. I want something that will run lean, mean, and can scale to datasets over a terabyte in size.

Tuesday, October 6, 2009

Natural Language Generation

While going over some nbc's, I stumbled across the AI area of NLP. But, while observing where the areas of nbc and NLP meet, I've found a new obsession: NLG. NLG is an acronym for natural language generation. Natural language generation is text created by a computer program that appears to be human-like in readability.

I first heard about this topic in detail in my AI class in grad school at University of San Francisco. My professor, Dr. Brooks, had mentioned that researchers had been trying for years to create programs that could generate narratives for computer games. I even recall seeing on some news aggregator that someone had successfully won a writing contest with a story written by a NLG system.

At the time I was taking my AI course, I remember working for a horrible boss. Who made all of us who he saw everyday and interacted with on a continuous basis, write weekly reports. I remember wanting to write a Perl or Python script that would do this for me. I made some attempts but it was hard to get any realistic variance. It was essentially an overglorified mad lib, where the program only filled in the blanks.

I was looking for something more natural and human like.

In NLG, one takes data and has generation rules that result in text that feels as if a human wrote it. Surprisingly, if one does a search on NLG, it is a relatively new area of research. Perhaps the best introduction to this topic is on Wikipedia. From the Wikipedia area, you will find yourself on the Bateman and Zock list of Natural Language Generators(http://www.fb10.uni-bremen.de/anglistik/langpro/NLG-table/NLG-table-root.htm)

At the moment, the state of the art appears to be based upon Java and Lisp languages. Since I work in embedded systems where speed and small footprint are key, I'm intersted in implementations that are in C and can scale. I've noticed that most of the NLP and NLG systems I found do not have a database backend. This surprises me since use of a database would allow for scaling and more consistant performacne as the dataset grows.

I think I'll be experimenting with NLG to see if I can make a program that will generate an email that asks a user for info based upon an email inquiry.

Friday, September 25, 2009

Grammars

The purpose of this entry is to describe the 4 type of grammars that can be used to classify a language, and the means used to classify a language as one of the four types of grammrs.

From a linguistics and NLP standpoint, languages can be classified by 4 possible grammar types. From the most expressive description to the least expressive description, a language can be described by a grammar known as type 0, type 1, type 2, or type 3. Each of the different grammars has a common name. A given grammar can sometimes describe other grammars. A type 0 grammar can describe type 1, type2, and type 3 grammars. A type 1 grammar can describe type 2 and type 3. A type 2 grammar can describe a type 3 grammar. A type 3 grammar cannot describe any other type of grammar.

For the four types of grammars, each are composed of rules known as productions, which have the general form of w1 -> w2 . Each production rule generates a sequence of terminal tokens and non-terminals. Non-terminals are production rules that go by the lhs symbol, w1 .

A recursively enumerable grammar is also known as a type 0 grammar. A type 0 grammar has no restrictions on its production rules. Context-sensitive grammar is a type 1 grammar. A type 1 grammar is restricted to productions where the number of symbols on the rhs is equal to or greater than the number of symbols on the lhs. Context-free grammar is a type 2 grammar. A non-terminal in a type 2 grammar can be replaced by its rhs. In comparison, a non-terminal in a type 1 grammar can only be replaced if there is a production that matches the symbols on the rhs with an equivalent lhs. A regular grammar is a type 3 grammar. A regular grammar is also known as a regular expression, which is used by Perl, Python, and grep when searching on strings. A production of a regular grammar has a restricted expression. The lhs is a non-terminal. The rhs is a terminal, which is optionally followed by a non-terminal.

Grammars are also more formally known as phrase structure grammars.

G = phrase structure grammar as a set

G = (V,T,S,P)

V is a vocabulary, a set of tokens and non-tokens(non-terminals)
T is a subset of V. T is the set of terminal tokens
S is a start symbol/token that is a member of V
P is a set of production rules
N is V - T, set of non-terminal symbols/tokens

Types of grammars and restrictions on their productions, w1 -> w2
0 no restrictions
1 length(w1) <= length(w2), w2=lambda
2 w1 = A, where A is non-terminal symbol
3 w1=A and w2=aB or w2=a, where A is an element of N, B is an element of N, and a is an element of T, or S->lambda