Monday, August 11, 2008

Assertions, sentences, and ratings

Ken mentioned that we're in the middle of reorganizing the database. I'll fill in some more details about what we're doing.

Currently, our users give their ratings to assertions, the things that make up the links of ConceptNet. Many sentences can yield the same assertion: for example, "dogs are mammals" and "a dog is a kind of mammal" both turn into an assertion that can be expressed as IsA(dog, mammal). The ratings on these assertions are useful to representations such as AnalogySpace.

The problem is that the OMCS web site doesn't want to show you IsA(dog, mammal), it wants to show you something in natural language. And some of the natural language we've collected is of

What should matter to OMCS isn't just how good the abstracted assertions are, it's how good the sentences are.

So we're reorganizing the database. After this, your ratings will apply to the sentences you see, and the scores on assertions will come from aggregating those ratings. We'll display each assertion using its highest-rated sentence. This puts our users in charge of which sentences show up, instead of arbitrary decisions by the computer.

The hard part of the reorganization is that we have to take all of the existing ratings and find out where they came from. If they came from a user on the new site, for example, we need to know what sentence they were looking at when they gave the rating. The database generally has this information, but not necessarily recorded in a smart way.

So what we've really been doing is cleaning up messes in the database while we track down where the ratings should go. And most of which were created by me a couple of days before giving a demo. Sorry about that.

Facebook

Hi everyone, I'm Catherine.

I will post some more substantive soon, but first-off I'd like to say we have a facebook group now. I'll hopefully update it with pictures and such as things become available.

In other news Rob, Henry, and I went to AAAI in Chicago to present AnalogySpace which went rather well. I had a great conference and really enjoyed the city of Chicago.

Friday, August 8, 2008

A Brief Note on Status

First, thanks to everybody who has been contributing! We're running all sorts of cool analysis stuff with our data, and most everything you put it makes the analysis a bit better. And that's just the beginning...

We've gotten some feedback recently that basically suggests that our interaction with our user community has been lacking. That has definitely ratcheted up in our priorities, and we have a few things in the works. But for the moment, I thought I'd hit on a few items that people have asked about recently.

  • Usability: Obviously the site has usability flaws -- some larger than others. Specific feedback is most helpful. We've been doing a lot of grimy back-end work, but with your help we won't neglect the front end.
  • Speed: will improve a lot when we move to a server that doesn't have a game port in the back (really). There are certainly some optimizations to do also, but we're prioritizing increased functionality. Stay tuned...
  • Acquiring common sense from elsewhere. Though we haven't shared any of the work yet, we've actually done a lot of work in combining our dataset with other data to get interesting new results. Soon we'll be leveraging those tools to pull in large amounts of (hopefully useful) information from several user-contributed large databases of knowledge. One is, yes you guessed it, Wikipedia.
  • The "fix" flag is just a temporary flag until we implement the UI to edit stuff. It's not quite as simple as it seems because of the interaction with ratings, etc. We're reorganizing the database right now so that such things will become possible.
  • Stats are actually available on http://commons.media.mit.edu/en/stats/ which we haven't promoted yet because it's not complete or well-explained. We do have the raw data to do graphs and other spiffy stuff, but just haven't gotten around to it. Suggestions for neat graphing libraries, or just straight-up code contributions, are welcome.
  • Community involvement is very important to us, though we haven't been showing it yet. We're trying to run this as an open-source project. We haven't officially released the main website code, but we can send a tarball on request, and we're considering ways of doing better. If you can code, you'll be welcome to help out, or help recruit others. The site is written in Python, using Django and the ConceptNet and Divisi libraries that are already available (see the links on the home page).
If there are any other things you'd suggest we do to improve interaction with the user community, please share your views in the comments.

Also, feel free to ask anything -- about the site, the project, us, etc.; we'll try to respond.

-Ken

Tuesday, August 5, 2008

Introduction

Hi, I'm Rob.

I've been working on stuff related to Open Mind since 2005, but I've had a bit of a low profile recently, because I thought I was going to go work on a different research project after I finished my master's thesis last year. After several months of dabbling in other projects and getting nowhere interesting, I found myself pulled back into OMCS. Moral of the story: if your grad school career ain't broke, don't fix it.

Probably the most visible things I've done are: I put OMCS back on the Web, replacing the 2000-era web site with a Rails site called Open Mind Commons. (When I left, I passed the torch to Ken, and he re-did it in Django, making the code incredibly cleaner in the process.) Also, along with Catherine Havasi, I developed AnalogySpace, the reasoning tool that learns from patterns in ConceptNet and comes up with the "Open Mind wants to know..." questions.

So now I'm back, and my focus is on the multilingual aspect of OMCS. We've got the infrastructure we need to build a ConceptNet in any language. Now our reasoning tools need to catch up.

I've seen the messages on the mailing list speculating that OMCS is stagnating. Well, it's not. It's changing in huge ways on our end, but it takes a while before we can put new features on the Web site. After all, we'd prefer the Web site to stay up.

But I understand that our users want to be in the loop. So I'm planning to write a few blog posts over the next few days about the new multilingual features I'm working on. Stay tuned.

Friday, February 22, 2008

All happy

No news is good news? It seems everything is working well; a few bugs here and there still, but we'll get to them ;)

Feel free to play with the Explore Concepts demo... think Google Sets, but in a semantic network. I'm not sure why the web frontend is slow; the backend code is plenty fast. We'll figure that out sometime; for now, just be patient.

Any bugs, feature requests, ideas, etc., welcome; feel free to comment here.

-Ken

Friday, February 8, 2008

Server under siege

Us and web robots are not getting along well. Our server has gotten overloaded several times recently to the point where we have had to bring it down until things settled. Apologies for the downtime; we'll get this stabilized soon.

But there is a new feature on the site... you're welcome to try it if you can find it, but it's still a little slow.

Tuesday, February 5, 2008

Adding knowledge enabled

After fixing the last few bugs (hopefully!), we've turned adding new knowledge back on.

Ever wished your computer could get a clue? Well, now you can now to "Add New Knowledge" and give it one. If you put in something that someone else already said, we'll take it that you agree with them.

Better and more fun ways of clue-giving are coming...

-Ken

Tuesday, January 29, 2008

OpenMind Common Sense / ConceptNet: Welcome!

For years we've enjoyed tens of thousands of good contributions of from our users, in terms of knowledge and ratings. So we thought that it's about time our users get some feedback from us. This blog is one way that you'll get to hear about what's going on with the site (like, why was it down?) and also see some of the ways that we're using the data you've been giving us.

So first, status: a batch of particularly malicious spam by one or two people hit our site two weeks ago. Since at the time we didn't have some of our spiffy tools quite ready, we decided to pull it down until the code monkeys caught up. To push them along, we went ahead and deployed the new version of the site, now powered by Python and Django.

We're still working hard to get everything working with the new framework. Some cool stuff is already available:
  1. The site is now available in multiple languages! There's two parts to that: the user interface of the website itself, and the actual common sense data in the database. We have a snapshot of the Portuguese database collected by our Brazilian collaborators, and are starting projects in Dutch, French, and possibly Arabic.
  2. Ratings are now much easier. Thumbs-up if it's good as-is. Thumbs-down if it's spam, nonsense, not something that any 10-year-old would know, or just not true. Yellow flag (meaning "Fix") if it would be true after some changes -- fixing grammar or spelling, or adding a qualifier ("sometimes", "occasionally").
Have fun with the new ratings system! We'll turn adding assertions back on real soon (though "Open Mind wants to know..." will take a bit longer).

back to coding,
-Ken
(a 1st-year Masters student... more later)