Sunday, August 30, 2009
Configuring TAEB at launch to use a different AI(and other settings)
I gave myself a crash course review of Perl. I mainly did this so I could understand the code for Demo.pm, which is the reference agent. I wanted to try to change agents, but I mistakenly thought this could be done with taeb's --config option.
In order to change the AI of Taeb, one must place a yml file into $HOME/.taeb. The yml file must be renamed to config.yml
Saturday, August 29, 2009
Setup of TAEB on openSUSE 11.1
I intend to do these same types of experiments with TAEB.
First step was to get nethack installed. nethack was available at www.nethack.org. OpenSUSE 11.1's current repositories alos had nethack available, so I just installed it with the OpenSUSE software installer.
I downloaded the latest TAEB using git. git clone git://github.com/sartak/TAEB.git
I discovered that if I attempted to install TAEB using sudo: sudo perl Makefile.PL
For some reason, the Perl function can_run can't find my installation of nethack. This was odd. It was necessary to create a symbolic link /usr/bin/nethack to /usr/games/nethack.
After making the symbolic link to the nethack executable, there were a number of warnings about additional Perl modules not found on the system:
I went back to eliminate each of these warnings. Unless specified otherwise, I used the cpan executable to eliminate these warnings.
Note, I found that Yaml::Scyk had to be installed but Makefile.PL did not detect and list that it was needed; When I ran taeb the first time, it complaed that it couldn't find the file YAML/Syck.pm.
After YAML::Syck was installed, I rebuilt and reinstalled. Seems to be working!
Wednesday, August 26, 2009
Naive Bayes Classifiers in SpamAssassin
As a start of this investigation, I've decided to start with some OSS Naive Bayes Classifer based spam filters. I'm starting with SpamAssassin. For the purposes of this experiment, I will be using SpamAssassin as a command-line tool.
spamassassin is a Perl front-end that one uses to classify an email, which is in a text file. One email per file.
sa-learn is a tool in the SpamAssassin suite that trains the nbc.
sa-learn --ham /path/to/directory/containing/ham loads the nbc with ham.
sa-leanr --spam /path/to/directory/containing/spam loads the nbc with spam.
I've only acquired a corpus of ham and spam of a few thousand emails. For what I need, I would like to have a corpus of up to a million documents which could be split into about 9 categories. I'm looking for a large corpus.
I've also noted with nbc's that process text, I've notice that there appears to be no restriction on the email size. In comparison to nbc's used with images, it is required that the images in the image corpus all be the same size. I wonder if this is really necessary. I will have to check that out with the face recognizer work currently in progress in my OpenCV project.
Sunday, August 23, 2009
Naive Bayes for text classification
Originally, I had leared about Naive Bayes in my AI class, which was taught by Dr. Christopher Brooks, at University of San Francisco back in 2005. Since then, the computer that held my information died and I've been unable to retrieve my nbc program. After a long time, I finally found good info on Naive Bayes and how to use it toward classifying text. First, I'll give the basic definitions of Bayes and Naive Bayes classification, which I'm simply resummarizing from the text book "Artificial Intelligence" by Russel and Norvig. Then, I'll talk about specifically applying Naive Bayes for text classification; This information I found in slides for a course called Comp221. The slides were written by the courses TA, Zhang Kai.
Naive Bayes
Like the regular Bayes algorithm, Naive Bayes simplifies the calculation of a conditional probability. It further simplifies the calculation of conditional probabilities by assuming that the effects are independent. Even though this may not be actually true, it has been found that this assumption yields acceptable behavior.
P(Class|Effects) = P(Class) * P(Effect1|Class).....*P(Effectn|Class)
Supervised Naive Bayes for Text Classification
The definition of Naive Bayes is easy to understand, but is lacking in any of the details that one must use to make a real application. I will fill in the details here(Thanks Dr. Brooks and Zhang Kai!)
(1) Start with a corpus and calculate P(Ci)
A corpus is a collection of data that will be used to train the Naive Bayes classiifer. This should be large number of items. The total number items should be around 1000. The corpus should be split into different classes, where each class occurs in the percentage one thinks the actual documents occur in real life. Ci is a class in C. P(Ci) = nc / ni, where nc is the number of corpus documents that correspond to Ci. ni is the total number of documents in the corpus.
(2)For each class, calculate the P(word|class)
For each class, there will be a collection of words that are associated with that class. One must calculate the probablity that a given word will occur in a particular class.
ni = number of total words in documents in Ci
wi = word associated with Ci
wij = number of times wi occurs in all Ci documents
P(wi|Ci) = wij / ni
For each class, if a word only occurs in the class Ci, this is considered a conditional probability of 'zero'. For an conditional probability that is a 'zero', assign it a value eta/ni, where eta/ni is some tunable value.
For each class, choose the top word frequencies as the words used to classify a document. Ideally, each word would occur in all Ci in C.
(3) After (1) and (2) have been performed, the nbc has been trained. d is a new document that is unclassified.
Take document d and find all the words that occur in the training corpus.
For each Ci, calculate P(Ci|Effects); For each word, wi, calculate the P(wi|Ci) wrt to the document d.
The largest P(Ci|Effects) is the matched class Ci for document d.
Training OpenCV for facial detection
I'm considering making my own tool which can ouput a input database in a format that OpenCV can use. I'm going to try to do this in GIMP. But, if I can't, then I'll do something with PIL or HighGui in OpenCV.
Sunday, August 16, 2009
Generating random numbers
od -An -N4 -l /dev/random
od -An -N4 -l /dev/urandom
Converting the Yale database to a format that OpenCV 1.1.1 can read
For example, the Yale Face Database files are all GIF's. OpenCV 1.1.1 does not read GIF's. They must be converted to some format that OpenCV can read. I chose to use png. but what is the easy way to convert these gif's to png's?
I used the Python Image Library and some 1-liner bash scripting magic to convert all gif's to png's.
First, I renamed all the database files to have a suffix ".gif"; The files in the Yale database don't have a suffix. I ran tests and perhaps this was not necessary but I didn't want to run into a problem with one of the files. This is the conversion I used in the directory that contains all the Yale database files:
for i in `lsxargs`;do pilconvert.py $i $i.png; done
If you see the string "lsxargs" above, this is wrong. There is a vertical bar between ls and xargs. I don't know why Blogger doesn't allow the pipe symbol.
pilconvert.py is a script that comes with the Python Image Library.
Now, if only the task of creating the rectangle for the ROI, region of interest, was just as easy.
Saturday, August 15, 2009
OpenCV and Yale database
The Yale database images are gifs. OpenCV doesn't read GIF's. Time for some Python and ImageMagik to convert these into PNG or JPG.
My Future OpenCV Projects
OpenCV Build Machine for Ubuntu 8.04
Implement supervised and unsupervised recognizers using OpenCV and Yale face databases
Use machine learning API in OpenCV to create a spam filter
USB 3.0 highlights
full-duplex
5.0 Gbps max throughput
I most excited about the full-duplex feature.
Why is my USB 2.x product so slow
Even with USB 2.x, which has a 480 Mb/s reported speed, implementers are often surprised that their data throughput is actually slower than RS-232 or even the parallel port. How could this happen?
The reason this can happen is poor utilization of the USB transfer protocol. There are two main data transfer types in USB, isochronous and bulk. The isochronous transfer can achieve the 480 Mbps rate, but this comes at a cost. The isochronous primary emphasis is speed. In order to achieve this data rate, the isochronous protocol makes no guarantees about the data arriving in order, no guarantees of data arriving without errors, and no guarantees about the data arriving at all.
USB bulk transfers are exact opposite. It is important that data arrive as sent and in order sent. In order to achieve this, USB bulk transfers add overhead but this reduces throughput. Let's run the numbers:
Theoretical maximum throughput of USB 2.x for bulk transfers:
480 Mbs = 480 Mb / s = 480 Megabits / second = 503 316 480 bits/seconds
Time in USB 2.x
1 frame = 1 ms
1 uframe = 125 us
8000 uframes / 1 second
Max data payload size of bulk usb 2.x transfer = 512 bytes
Max bytes per frame(Table 5-10, USB 2.0 spec) = 6665
6665 bytes / uframe * 8000 uframes / s = 53248000 bytes / s = 406.2 Mb/s
406.2 Mb/s is fast but this can only be approached if the USB 2.x bulk transfer is used efficiently.
Tips on reaching the max limit of the USB 2.x bulk transfer protocol
- Minimize the amount of reading and writing of data. If possible, let data only transfer in one direction. USB 2.x is a half-duplex protocol.
- As much as possible, fill the entire transfer to its maximum data payload size of 512 bytes.
- Directly connect the USB 2.x peripheral to the host PC's hub.
Thursday, August 13, 2009
Midnight Engineer's resume
BSEE SDSU '91
BSCS NDNU '05, 4.0 GPA,
USF for 1 year in the MSCS program
Wind River 2007 to present
Sr. engineer specializing in using the Wind River compiler(Diab) with VxWorks, and build expert for Vxworks and Wind River Linux. Recognized expert in using VMware for recreating customer design environments. Creator of library of VMware machines for engineering, sales, and support use.
Nuvation 2005 to 2007
Firmware engineer who specialized in optimizing bulk transfer USB-based products for speed. Sustaining engineering for medical product for surgery. Integrated driver into Mac OS X of a cardbus memory card. USB architect for power control module. Introduced VMware as means of organizing and managing software tools for projects. Recognized expert in USB 1.x and 2.x protocols for hardware and software.
Internet Archive 2005
Couducted searches for archive websites. Data mined terabyte-sized binary and data using one-liner perl and bash shell commands.
Xilinx 1994 to 2005
5 years, Support Engineer specializing in using Verilog and VHDL for FPGA designs, and JTAG for configuration.
5 years, Sr. Systems Design Engineer created PROM,CPLD, and FPGA programming algorithms. Created hardware configuration tools that used USB bulk transfers to configure PROM's, FPGA's, and CPLD's. MFC GUI programmer.
Languages
C,C++, Java,Python, bash shell, Perl, VHDL, Verilog, ANTLR,MFC
Tools
Eclipse, gdb,gcc,CATC USB protocol analyzer, oscilloscope, logic analyzer,Workbench,emacs
Interests
ANTLR, statistics-based algorithms for AI, programming languages, Linux, OpenCV
Monday, August 10, 2009
Installing OpenCV 1.1.1 on OpenSUSE 11.1
I recently discovered the open source computer vision API called OpenCV. Originally created by Intel. It is now an open source project. It allows you to create computer vision applications that do things like facial recognition and tracking of moving objects. It's most famous implementation was in "Stanley" a robot who competed in the DARPA challenge to navigate a road without human guidance.
The procedure for installing OpenCV on a VMware Linux appliance was well documented by Damine Steward. His procedure used Ubuntu 8.04 as a basis. This is a reimagining of that procedure using OpenSUSE 11.1.
(1)Install OpenSUSE 11.1 in a VMware appliance created by VMware Workstation 6.5.
(2) After installing OpenSUSE 11.1, no need to install the VMware Tools. OpenSUSE 11.1 installs it for you.
(3) You will need to enable networking. You will need to turn off DHCP6 for your network interfaces. You may need to reboot.(4) Confirm that networking is working by open Firefox and trying to surf to an URL.
Add the Packman and Videolan repositories. Run the "Install Software" application. Insert the OpenSUSE 11.1 DVD into the cdrom drive. In the "Install Software" application:
Packages Listing->Repositories
Edit
Add
Select the "Community Repositories" radio button
Next
In the list that appears, select the Packman and Videolan repositories.
(4) Install the packages: checkinstall, yasm, libfaac-devel, libfaad-devel, libmpelame-devel, libtheora-devel, libxvidcore-devel, install portaudio-devel, twolame, libtwolame-devel , libpng3, libjpeg-devel, libtiff-devel, libjasper-devel
If you get an error message that libtheora-devel cannot be installed, choose to install the suggested alternative.
(5) svn checkout svn://svn.ffmpeg.org/ffmpeg/trunk ffmpegcd ffmpeg./configure --enable-gpl --enable-postproc --enable-pthreads \--enable-libfaac --enable-libfaad --enable-libmp3lame \--enable-libtheora --enable-libx264 --enable-libxvid \--enable-shared --enable-nonfree
make
sudo make install
(6) svn checkout http://svn.berlios.de/svnroot/repos/guvcview/trunk/ guvcview
make
sudo make install
guvcview
Your USB camarea should be working with the guvcview tool.
(7) Next, download the examples from the OpenCV book from Oreilly. http://examples.oreilly.com/9780596516130/
Unzip these examples. You will use this to test the build and install of OpenCV.
(8) Download the latest OpenCV from the trunk. As of this writing 8/11/09, the version of OpenCV is v1.1.1.
svn co https://opencvlibrary.svn.sourceforge.net/svnroot/opencvlibrary/trunk/opencv
cd into opencv
mkdir release
cd opencv
mkdir release
cd release
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D BUILD_EXAMPLES=ON ../
make
cd bin
(9) ./cxtest
only 4 test fail will fail
(10) ./cxcoretest
all tests pass
cd ..
(11) sudo make install
(12) In .bashrc add: export LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
(13) cd release/bin
(14) ./lkdemo
This is the sample tracking application. You should see it work with your webcam.
(15) Unzip the OpenCV examples in directory of choice
(16) g++ -o test ch2_ex2_1.cpp `pkg-config opencv --cflags --libs`
./test stuff.jpg
You should view a picture.
(17) g++ -o test ch2_ex2_2.cpp `pkg-config opencv --cflags --libs`
./test test.avi
./test tree.avi
Both avi movies should play
And that's it! Have fun.! If you have feedback on this procedure, feel free to contact me.