Monday, August 11, 2008

Assertions, sentences, and ratings

Ken mentioned that we're in the middle of reorganizing the database. I'll fill in some more details about what we're doing.

Currently, our users give their ratings to assertions, the things that make up the links of ConceptNet. Many sentences can yield the same assertion: for example, "dogs are mammals" and "a dog is a kind of mammal" both turn into an assertion that can be expressed as IsA(dog, mammal). The ratings on these assertions are useful to representations such as AnalogySpace.

The problem is that the OMCS web site doesn't want to show you IsA(dog, mammal), it wants to show you something in natural language. And some of the natural language we've collected is of

What should matter to OMCS isn't just how good the abstracted assertions are, it's how good the sentences are.

So we're reorganizing the database. After this, your ratings will apply to the sentences you see, and the scores on assertions will come from aggregating those ratings. We'll display each assertion using its highest-rated sentence. This puts our users in charge of which sentences show up, instead of arbitrary decisions by the computer.

The hard part of the reorganization is that we have to take all of the existing ratings and find out where they came from. If they came from a user on the new site, for example, we need to know what sentence they were looking at when they gave the rating. The database generally has this information, but not necessarily recorded in a smart way.

So what we've really been doing is cleaning up messes in the database while we track down where the ratings should go. And most of which were created by me a couple of days before giving a demo. Sorry about that.

Facebook

Hi everyone, I'm Catherine.

I will post some more substantive soon, but first-off I'd like to say we have a facebook group now. I'll hopefully update it with pictures and such as things become available.

In other news Rob, Henry, and I went to AAAI in Chicago to present AnalogySpace which went rather well. I had a great conference and really enjoyed the city of Chicago.

Friday, August 8, 2008

A Brief Note on Status

First, thanks to everybody who has been contributing! We're running all sorts of cool analysis stuff with our data, and most everything you put it makes the analysis a bit better. And that's just the beginning...

We've gotten some feedback recently that basically suggests that our interaction with our user community has been lacking. That has definitely ratcheted up in our priorities, and we have a few things in the works. But for the moment, I thought I'd hit on a few items that people have asked about recently.

  • Usability: Obviously the site has usability flaws -- some larger than others. Specific feedback is most helpful. We've been doing a lot of grimy back-end work, but with your help we won't neglect the front end.
  • Speed: will improve a lot when we move to a server that doesn't have a game port in the back (really). There are certainly some optimizations to do also, but we're prioritizing increased functionality. Stay tuned...
  • Acquiring common sense from elsewhere. Though we haven't shared any of the work yet, we've actually done a lot of work in combining our dataset with other data to get interesting new results. Soon we'll be leveraging those tools to pull in large amounts of (hopefully useful) information from several user-contributed large databases of knowledge. One is, yes you guessed it, Wikipedia.
  • The "fix" flag is just a temporary flag until we implement the UI to edit stuff. It's not quite as simple as it seems because of the interaction with ratings, etc. We're reorganizing the database right now so that such things will become possible.
  • Stats are actually available on http://commons.media.mit.edu/en/stats/ which we haven't promoted yet because it's not complete or well-explained. We do have the raw data to do graphs and other spiffy stuff, but just haven't gotten around to it. Suggestions for neat graphing libraries, or just straight-up code contributions, are welcome.
  • Community involvement is very important to us, though we haven't been showing it yet. We're trying to run this as an open-source project. We haven't officially released the main website code, but we can send a tarball on request, and we're considering ways of doing better. If you can code, you'll be welcome to help out, or help recruit others. The site is written in Python, using Django and the ConceptNet and Divisi libraries that are already available (see the links on the home page).
If there are any other things you'd suggest we do to improve interaction with the user community, please share your views in the comments.

Also, feel free to ask anything -- about the site, the project, us, etc.; we'll try to respond.

-Ken

Tuesday, August 5, 2008

Introduction

Hi, I'm Rob.

I've been working on stuff related to Open Mind since 2005, but I've had a bit of a low profile recently, because I thought I was going to go work on a different research project after I finished my master's thesis last year. After several months of dabbling in other projects and getting nowhere interesting, I found myself pulled back into OMCS. Moral of the story: if your grad school career ain't broke, don't fix it.

Probably the most visible things I've done are: I put OMCS back on the Web, replacing the 2000-era web site with a Rails site called Open Mind Commons. (When I left, I passed the torch to Ken, and he re-did it in Django, making the code incredibly cleaner in the process.) Also, along with Catherine Havasi, I developed AnalogySpace, the reasoning tool that learns from patterns in ConceptNet and comes up with the "Open Mind wants to know..." questions.

So now I'm back, and my focus is on the multilingual aspect of OMCS. We've got the infrastructure we need to build a ConceptNet in any language. Now our reasoning tools need to catch up.

I've seen the messages on the mailing list speculating that OMCS is stagnating. Well, it's not. It's changing in huge ways on our end, but it takes a while before we can put new features on the Web site. After all, we'd prefer the Web site to stay up.

But I understand that our users want to be in the loop. So I'm planning to write a few blog posts over the next few days about the new multilingual features I'm working on. Stay tuned.