Monday, September 28, 2009

New SQLite database

We finally have an updated SQLite database for ConceptNet. It's designed for use with ConceptNet 4.0b8 (just released). This will fix the long-standing "best_raw_id" bug.

This database now includes data that was imported from the online game Verbosity. It also includes the initial import of the Chinese ConceptNet. This comes to us thanks to our collaborators in Taiwan, Jane Hsu, Yen-Ling Kuo, Edward Shen, and the many people playing the online games they developed.

We've also cleaned up our documentation, and written tutorials for some key things you may want to be able to do with ConceptNet and Divisi, at http://csc.media.mit.edu/docs/ .

Thursday, August 20, 2009

Verbosity, and one meeeelion sentences

How did we just get nearly 200,000 new statements in Open Mind Common Sense?

We've just imported a whole lot of data from Verbosity, one of Luis von Ahn's Games with a Purpose. Verbosity collects common sense knowledge through a game: one person is given a word, and needs to get the other person to guess that word by listing common-sense facts about it.

The data is rather noisy in places, but after some filtering, we've got a list of new statements about as reliable as the other score-1 statements in OMCS. These include a number of useful "is not" statements, describing things that are different, which we've never prompted for on OMCS before, as well as many examples of a new relation, "SimilarSize", expressing the statement "X is about the same size as Y".

A side effect of this is that it's pushed our total sentence count for English over one million! Of those, we can parse about 542,600 so far (we've still got a lot left to try to parse from the original Open Mind), and those translate to about 504,700 unique assertions in ConceptNet.

Thank you to all our contributors (especially those who are patient enough to try to deal with our current web site), and to all the players of Verbosity!

Monday, August 10, 2009

Welcome back, Catherine Havasi!

Catherine Havasi co-created the Open Mind Common Sense project, as an undergraduate researcher working with Push Singh way back in 1999. For the last five years, she's been working on a doctorate in computational linguistics at Brandeis University. She's been doing a lot of cross-campus research with this group.

Last month, she finally earned her Ph.D (congratulations!). Now, she's returned to the Media Lab as a post-doc, where she'll once again be able to work on Open Mind and its applications full time. It's great to have her as an official part of the group again!

Wednesday, June 24, 2009

How to make Fink work when it has the wrong URL

I hit a stumbling block today, and this is one of those things that really should be Googleable.

If you're installing libraries on a Mac, you might be doing it through Fink. And Fink has the unfortunate property that a lot of its download URLs give 404 errors, leaving you stuck. This was the case for the "cloog" library. I don't know what it does, but it's required by hdf5, which is needed by pytables, which we need to store Divisi results on the disk instead of in every instance of the Web server. All the URLs that Fink looks for when it tries to download "cloog" are broken.

The workaround is to Google for the file yourself, and download it into the /sw/src directory.

I hope this helps someone else who runs into the same problem.

Divisi for Windows

The theme of this week is "make it so that our underlying code can actually be run by other people". One recent accomplishment: I finally figured out how to make a Windows installer of Divisi, our machine learning library. (The hard work to make Divisi compile on Windows at all was done by contributor Akshay Bhat. Thanks, Akshay.)

Monday, June 1, 2009

Bugfixes and improvements

I know we're a bit quiet on the PR front, but as usual a lot is happening under the hood. You can see what we're up to by watching Launchpad, e.g., Divisi trunk and ConceptNet trunk. I'll highlight a few recent examples:
  1. Divisi works on Windows. According to Akshay. We haven't tried it. Though properly supporting Windows means a double-click .exe installer -- which setuptools can apparently make, but we haven't figured out yet.
  2. We finally renamed u_distances_to to u_dotproducts_with in SVD results; the name has been wrong ever since I wrote that code maybe a year and a half ago.
  3. I wrote csc.divisi.util.PickleDir, which I've found really helpful for hanging onto temporary data that's a little longer-lived than an ipython session.
  4. I improved how Divisi summarizes SVD results (in two commits).
  5. I refactored how Analogyspace is built, making it easier to try out different combinations of things. Object-oriented code definitely improves things, but I still don't think I hit the sweet spot; certain customizations are still too hard. Any input from software architects? It's a set of mostly composable operations, though certain things only make sense in certain cases...
  6. top_items had been effectively ignoring its new key parameter. Fixed.
  7. We had been returning squared magnitudes for tensors. Oops. Fixed. Fortunately, I don't think this was used.
  8. Finally got around to implementing the (pretty trivial) decomposition of a vector into the parallel and perpendicular components to another vector. That required filling out some other routines, fixing tests, etc.; I think they call that 'yak shaving'; my real goal was to figure out why AnalogySpace was coming out differently than a few weeks ago.
  9. All this stuff is begging a new release. In time... for now, you can use the bzr head; we try not to break the trunk too often.
We also have a bunch of awesome documentation that hasn't gotten linked to in the main pages (e.g., conceptnet.media.mit.edu). More yak shaving: I went to edit the page and realized that we hadn't committed our local website changes into our web svn, so I committed some things, running into a svn bug and writing about it. But before I linked to the docs, I really wanted to move them to csc.media/docs, but that required mucking with the Apache configuration. I remembered that we had wanted to try out nginx, so I got that set up on a backup port on the server and got a dynamic site (csc.media) configured for it. I actually should have just stuck to getting the static config working and forward to Apache for the dynamic config, because that's what each is good at! Anyway, that required futzing with a fastcgi socket permissions issue (I tried using a Unix domain socket -- cool things, but as documented, they have permissions issues.) etc., etc.

I had planned to work on my thesis, but... speaking of thesis, Jayant just graduated. Thankfully he's staying a little while longer to wrap things up, so maybe he'll post something about his thesis.

Tuesday, May 5, 2009

Speed issues

The Open Mind Common Sense website is currently really, really slow, and I'm sorry about that.

As we acquire more users and try to do more complicated reasoning behind the scenes, clearly what we need to do is spend the piles of money that we have just sitting around on a huge fancy server

Sorry, I meant to say: clearly what we need to do is keep finding ways to cache lots of stuff and using whatever computing power we can find. Anyway, I'm working on it.

Wednesday, April 22, 2009

New site.

We've got a new version of the Open Mind Common Sense site: openmind.media.mit.edu

The big changes:
  • It's based on the Pinax web framework. This should make it easier to add features to the site.
  • It's running on ConceptNet 3.5 instead of 3.0. (So was the old site, kinda, but it was a hack that wasn't sustainable.)
  • It distinguishes between "assertions", the normalized connections between concepts that Open Mind learns from, and "statements", the roughly parsed text that people have typed in. You can vote on both of them. This is a key step toward putting back the free text box.

Thursday, February 26, 2009

New Mailing List

Also in the realm of new and exciting stuff is our announcement mailing list. It's we'll use it for any big news we have and to announce workshops, symposiums, and software releases.

Subscribe yourself here!

Launchpad and Bazaar

We're on Launchpad now. We can host our version control there, track bugs, and answer questions from users.

For people who work on Open Mind within the Media Lab (and possibly even others), here's a guide to hacking on the code using Bazaar.

Thursday, February 19, 2009

IUI

Hi everyone!

We've been quite busy lately and I'm going to make a point to update this more often. I really intend to, and since all I have to do this semester is graduate and transition to being a post-doc on top of Open Mind I should have lots of free time. What have we been up to? Well...

Most recently, Henry, Erik Mueller , and I ran a workshop on story understanding at IUI last week. Rodger Schank gave an interesting keynote on designing user interfaces using stories. He focused on how people interact and convey information naturally using stories rather than using the constructs which are common in user interfaces today. He talked about how "cavemen" communicated (ie, what are the modes of interaction we've been using all along), just-in-time information and making interfaces more goal-directed. We had a lot of good discussions, saw papers presented, had a demo session and many of us are still communicating by email. I'll try to post a more comprehensive summary later.

Also at IUI, Jayant and I gave a main conference talk on a paper by pretty much all of us using mixture models to play a game of twenty questions with the user. It asks questions which are selected to help AnalogySpace infer information about a new concept.

There's lots of stuff in the pipeline:
  • There are thoughts of an AAAI symposium in a year for the entire common sense community.
  • I'm working on a new technique for infusing normal data and reasoning techniques with common sense. It's working really well.
  • We're planning on putting up a lot of documentation soon and possibly some videos. Keep watching.