Four years ago I started using Goodreads to maintain the list of books I've read (which had lived in a flat text file for a decade+ before that).

Now it's been aquired by Amazon. I doubt it will survive in its current form for more than 2 years. Anyway, while Goodreads has been a quite good way to find what my friends are reading, I've been increasingly annoyed by the quality of its recommendations, and its paucity of other features I need. It really doesn't seem to help me keep up with new and interesting fiction at all, unless my friends happen to read it.

So I looked at LibraryThing. Actually, I seem to have looked at it several times before, since it had accounts named "joey", "joeyh", and "joeyhess" that were all mine. Which is what happens to me on sites that lack Openid or Browserid.

Digging a little deeper this time, I am finding its recommendations much better than Goodreads' -- although it seems to sometimes recommend books I've already read. And it has some nice features like tracking series, so you can easily tell when you've read all the books in a series or not. The analytics overall seem quite impressive. The UI is cluttered and it seems to take 5 clicks to add and rate a single book. It supports half stars.

Overall I get the feeling this was designed for a set of needs that doesn't quite match mine. For example, it seems it doesn't have a single database entry per book; instead each time I add a book, it seems to pull in data from primary sources (library of congress, Amazon cough) and treat this as a separate (but related) entry somehow. Weird. Perhaps this makes sense to say, librarians. I'm willing to adjust how I think about things if there's an underlying reason that can be grasped.

There's a quite interesting thread on LibraryThing where the founder says:

Don't say we should open-source the code. That would be a nightmare! And I have limited confidence in APIs. LibraryThing has the book geeks, but not so much the computers geeks.

I assume that the nightmare is that there would be dozens of clones of the site, all balkanized, with no data transfer, no federation between them.

Except, that's the current situation, as every Goodreads user who is now trying to use LibraryThing is discovering.

Before I ever started using Goodreads, I made sure it met my minimum criteria for putting my data into a proprietary silo: That I could get the data back out. I can, and have. LibraryThing can import it. But the import process loses data! And it's majorly clunky. If I want to continue using Goodreads due to its better UI, and get the data into LibraryThing, for its better analytics, I have to do periodic dumps and loads of CSV files with manual fixups.

This is why we have standards. This is why we're building federated social networks like status.net and the upcoming pump.io that can pass structured data between nodes transparently. It doesn't have to be a nightmare. It doesn't have to rely on proprietary APIs. We have the computer geeks.

Thing is, sites like GoodReads and LibraryThing need domain-specific knowledge, and communities to curate data, and stuff like that. Things that work well in a smallish company. (LibraryThing even has a business model that makes sense, yearly payments to store more books in it.)

With free software, it's much more appealing to sink the time we have into the most general-purpose solution we can. Why build a LibraryThing when we could build something that tracks not only books but movies and music? Why build that when we could build a generic federated network for structured social data? And that's great, as infrastructure, but if that infrastructure is only used to build a succession of proprietary data silos, what was the point?

So, could some computer & book geeks please build a free software alternative to these things, focused on books, that federates using any of the fine APIs we have available? Bear in mind that there is already a nice start at a comprehensive collection of book data in the Open Library. I'd happily contribute to a crowd funded project doing this.

Ways forward
  1. Need to get a decent Z39.50 search tool built. The libraries exist in the repos. Librarian types and hacker types need to interact as to how that lovely protocol transfers data so that book records can be grabbed in a suitable fashion.

  2. The RFP in Debian Bug #702134 is a heavy-weight but usable F/LOSS solution to the problem put.

  3. Librarians have been worrying about the how's and why's of modern cataloging since 1842. There is a lot of history there that may seem inscrutable but has reasoning to it.

Comment by skellat [launchpad.net]
Zotero, Filmaster...

I use Zotero when I want to remember a book I read or want to read.

There is also http://filmaster.com/about/ which could be examined to find a similar model for books.

http://pump.io/ could be the glue between all this.

Comment by magicfab
BookBrainz
A co-worker at MusicBrainz (a metadata database about music :) has spent a bit of time on this at https://github.com/ocharles/BookBrainz . At MetaBrainz we would like to eventually have a family of open databases for music, books, video games, etc.. If anyone else reading this is interested in building a Goodreads/LibraryThing alternative I'd like to invite you to join us in #musicbrainz or #bookbrainz on freenode irc. MusicBrainz is a community edited database, and many people in our community would be interested in helping out on open source + open data metadata databases focused on other media. -- warp / kuno.
Comment by Kuno Woudt
Future

Now GoodRead's been acquired by Amazon. I doubt it will survive in its current form for more than 2 years

https://en.wikipedia.org/wiki/LibraryThing#Ownership_and_membership

Online bookseller AbeBooks (now owned by Amazon) bought a 40% share in LibraryThing in May 2006 for an undisclosed sum.

Comment by gwern
comment 4

The 40% number that has been going around is not accurate, according to the LT founder. In any case, he still has a controlling interest in his company.

(With that said, I could file half a dozen bugs on LT right now that any engineer could fix in a day; they don't seem to have a lot of manpower developing the site.)

Comment by joey
comment 5

I don't see any cite for the figure being wrong, and I'd guess it's now an underestimate rather than an overestimate (where would LT get the money to buy out any part of Amazon's stake?).

Regardless, as a ~40% minority shareholder an a multi-billion-dollar Internet giant, Amazon can make LT and its founder's life a living hell if it so chooses, and hence its suggestions are less suggestions than orders which are ignored at a high cost. The war between Craigslist and eBay (which holds just a ~25% stake) is a recent demonstration of this. LT operates at Amazon's sufferance; I'd guess that its lack of (known) interference and purchase of GoodReads has more to do with GR growing much more than LT, and I don't see why Amazon would interfere substantially with GR either.

Comment by gwern
LibraryThing is cataloging-centric, Goodreads is reading-centric.

LibraryThing counts differences where Goodreads doesn't, so I'm not surprised transfer isn't easy. Also, LT is pretty open about accepting bug reports... But as you note they don't have that much manpower. I'm biased towards LT even though they aren't willing to open the code. I'm not sure the code matters as much as the data here... Although federation may help the libraries a bit more over the long term, IMHO the value is in the merged data. LT is an interesting example for the future of federation. It's both vertical (library-oriented) and widely applicable (readers). They already are challenging certain tightly-licensed cataloging methods with open ones... Interesting set of trade-offs I haven't fully considered.

BTW, @LibraryThingTim posted an etherpad over on the T-world asking for help on "What Makes LibraryThing LibraryThing?" that may bring out more points: http://librarything.piratenpad.de/449

Comment by Jason Riedy
Recomendations in a federated world

While in general I'm in favour of a decentralised federated approach wouldn't making book recommendations work via collaborative filtering require broadcasting all the data to the entire federation? With the data centralised one only has to worry about Tim Spalding abusing it not every member of the federation.

I also tried the LT import from goodreads functionality and found it missing data (ratings specifically). I found I could get more data in by converting goodreads export to librarything's preferred import format as long as I made sure all text fields were quoted.

Comment by William Hay