« January 2008 | Main | March 2008 »

February 2008

Code4Lib 2008: Open Library

In his presentation, Building the Open Library, Aaron Swartz introduced us to his vision of an online library.  In his vision, like Brewster’s, he sees a wiki with one page for every book.  For this reason, the small group (6 people spread out around the world) is starting their project with monographs.

To achieve this feat, the team is using their own database framework called ThingDB:

ThingDB stores a collection of objects, called “things”. For example, on the Open Library site, each page, book, author, and user is a thing in the database. Each thing then has a series of arbitrary key-value pairs as properties. For example, a book thing may have the key “title” with the value “A Heartbreaking Work of Staggering Genius” and the key “genre” with the value “Memoir”. Each collection of key-value pairs is stored as a version, along with the time it was saved and the person who saved it. This allows us to store full semi-structured data, as well as travel back thru time to retrieve old versions of it.

Gathering Data

Obviously a library isn’t anything without data, so to start, the team contacted publishers for their ONIX data - surprisingly they were mostly receptive - they wanted their books to be findable.

Next, they contacted librarians to ask for data dumps for their catalogs - unsurprisingly they didn’t get the same kind of response that they got from the publishers.  Librarians wanted to think about it for a while…  Long story short, they have some library data, but would love more. 

Now that they had book data, they wanted to enhance it with additional content like book reviews from the New York Times, Harper’s, Reader’s Catalog, and the New York Review of Books.  These titles will all soon have their reviews integrated into the site!!

Lastly, they’re scanning books to get data.  This is where the Internet Archive comes in.  They are providing their scans and data for the Open Library project.

The Library

The library itself has to focus on display.  When a user enters a search term you will get back a book page, each book page gives you more info about the book - buy, borrow, download.  From each book page,  each author has a page as well, this way they’ll be able to auto generate bibliography for author.  This is very much like the LibraryThing author pages.

So, now that we have library with pages for books and authors, we need to organize data.  Aaron was awfully funny here - he had librarians arguing - but what subjects should we use?  Which classification scheme do we use?  We’re going to have to think about this!  Aaron says quite simply - there is no need to argue - it’s only we can use them all!!  I love it - very Everything is Miscellaneous - we can organize things in any way we want on the web - we aren’t limited by the physical world!!

There is also a sort of FRBR where you can link books together.

So now we have an online library - how do we keep it updated?  Each page (book, author, etc) is editable - it’s a wiki!! In addition to that, you can easily edit the templates for your own need or make fixes to bugs you find in the templates that the Open Library is using.

The Future

In the future, they want to provide scan on demand - for $20 or $30 they’ll go get a scanned copy of the book.  Then the PDF is put online with a bookplate saying that you paid for that book to be digitized.  Now, the PDF is available to everyone!!

Aaron’s dream is to have a web of books online - all the information about the book - all the people who reviewed it, all the libraries that have it - all the places you can buy it - all in one place - so that everyone can find any book and find out how to access the information it holds.

In order to fulfull Aaron’s dream, we have to share. “We want your data” - share your MARC data with the project (something that a few people at the conference did as a gift to Brewster for his keynote).  If this is to be a open-source project you need to share.  Also, as an open-source project, they need all the help they can get - so chip in!

Questions & Answers

Q: Can we scan on demand now?

A: Scan on demand is not available now - but it should be done in the next couple weeks - we’ll see

Q: Will we get a copy of the items to put in our catalogs if we pay for it to be scanned?

A: The idea is that the book will scanned then a URL will be provided that can be put in the 856 field in your catalog.

Q: What about books that are only published online?

A: Yes - any and all books - get as much in there as possible

Q: Is there an API?

A: They are planning an API - so that you can get any book page in the format they need

Q: Where are you getting cover art? 

A: LibraryThing - user scanned covers, Publishers give covers and we got a dump of covers from Amazon.  We want to let libraries use them so we got as many covers as possible.

Q: Plans for Internationalization?

A: It should be translatable in the future

More Info

Demo: demo.openlibrary.org

This article (subscription required) discusses the potential friction between Open Library and WorldCat.  Will the success of the former spell doom for the latter?  How will librarians respond to the invitation to send records to one or the other, or both?  [via LISNews]

Find more press about Open Library.

Conclusions

There were no negatives out of this guy!!! The project sounds so much better than I had even realized from reading articles and blog posts. I love it - this is amazing :) and I can’t wait to see more!

Technorati Tags: , , ,

Ironic: IBM Unveils Healthcare Island on Second Life

IBM, who has an anti-Second Life commercial, has opened a new island in Second Life.  (story)

Code4Lib 2008: Code4Lib Journal

Jonathan Brinley, Edward M. Corrado, and Jodi Schneider talked to us about the Code4Lib Journal, a project that had been talked about for years but never implemented until recently. The moral of this story is stop talking and just do it.

They decided to use an agile development philosophy, which basically means don’t over-engineer complicated rules and procedures your might never need - just work on what you need now and the rest will come.

Blog v. Journal

So, why did they choose to do a Journal instead of a blog?  In short they chose a journal because it comes with a bit of a stamp of approval that some people need in order to move up the ladder at their workplace - in particular among those in academia.

Where to start?

Get an ISSN - Code4Lib Journal - 1940-5758.  They thought this was going to be a crazy process, but it’s just a one page form with a few questions.  I’ve actually applied for a few ISSNs - two for work and one for my blog - which Ed Corrado suggested we all go out and do since it’s so easy - but I can tell you that they will turn you down! and if they don’t - let me know and I’ll try again.

Other details

They decided to have rotating coordinating editors so that not one person was in charge all of the time.  They also decided to have a public listserv - c4lj-discuss@googlegroups.com - so that everyone can follow along with discussions about the journal.

Articles can be sent in several different formats - right now the editors have worked with almost all of them.  They then use WordPress as a publishing tool because it has a flexible templating engine that allows you to make a site not look like a blog and allows for private posts, public posts and public pages.  It also comes with stats and other neat plugins that make it the right tool for them to use now - because their agile it may not always be the tool they use.

They’ve gotten their journal listed is DOAJ & Ebsco and it is also being blogged about which is bringing traffic to the site (however - just a note to bloggers for some reason trackbacks aren’t working yet - so post comments on the articles as well).  Along those lines, they’d like to see more comments on the journal site - Code4Lib is a community and they want the journal to reflect that.

Overall an interesting talk with some great ideas for publishing a journal online with free tools available on the web.

Technorati Tags: , , ,

Code4Lib 2008: The Internet Archive

What a great way to open a conference like Code4Lib.  The first keynote was presented by Brewster Kahle of the Internet Archive.

Brewster started by reminding us that the reason he was there talking to us and the reason he is working on the Internet Archive is because the library metaphor easily translates to the Internet - as librarians we’re paid to give stuff away!  We work in a $12 billion a year industry which supports the publishing infrastructure.  With the Internet Archive, Brewster is not suggesting that we spend less money - but that we spend it better.

He started with a slide of the Boston Public Library which has “Free to All” carved in stone.  Brewster says that what people carve in stone is taken seriously - and so this is a great example of what libraries stand for.  Our opportunity now is to go digital.  Provide free digital content in addition to the traditional content we have been providing.  I loved that he then said that this is not just a time for us to be friendly together as librarians - but to work together as a community and build something that can be offered freely to all!

He went on to say that what happens to libraries is that they burn - they tend to get burned by governments who don’t want them around.  The Library of Alexandria is probably best known for not being here anymore.  This is why lots of copies keeps stuff safe. Along those lines, the Internet Archive makes sure to store their data in mirror locations - and by providing information to the archive we’re ensuring that our data is also kept safe and available.  This idea of large scale swap agreements (us sharing with the Internet Archive, us sharing with other libraries, etc) in different geographical regions finds us some level of preservation.

How it started

The internet archive started by collecting the world wide web - every 2 months taking a snap shot of the web.  Brewster showed Yahoo! 10 years ago - ironically a bit of data that even Yahoo! didn’t have - so for their 10 year anniversary they had to ask the Internet Archive for a copy of what their site looked like!  He showed us the first version of Code4Lib’s site and exclaimed “Gosh is that geeky!” because it was a simple black text on white background page.

While it may have seemed a bit ambitious to archive the web, the Wayback Machine gets about 500 hits a second.  And it turns out that the out of print materials on the web are often just as valuable as the in print information on the web.  People are looking for the way things were for historical or cultural research reasons and this tool makes it possible.

Audio

The Grateful Dead started a tradition in the 60s of allowing people to record their concerts and share them with others - this tradition of tape trading caught on and lots of bands were doing this.  Following in this tradition, the Internet Archive decided to offer unlimited storage and unlimited bandwidth for free to any band who wanted to provide recordings of their concerts to the archive.  It’s a bit different than tape trading, but an amazing idea! They are getting 1 or 2 bands a day - around 30,000 concerts now and it’s working!  Overall the community is building the best metadata Brewster’s ever seen - beautiful work supported by a community - just what I love to hear!!

This shows that librarians can provide a role other than providing information - they can provide back end storage for information.  By giving people like these bands a place to store their music for free, the Internet Archive made it so that concerts are now available online for those in search of them!

Moving Images

1000 movies that are out of copyright are available via the Internet Archive.  Interestingly, the things that are popular are movies you can’t get any other way - movies you wouldn’t expect people to be interested in at all - government films, social behavior films like the ones you saw in high school when you had a substitute teacher - they’re fantastically popular. Brewster theorizes, and I tend to agree that people are using these videos as research tools to see what things were like culturally at different times in history. 

Brewster is a follower of the “it’s easier to apologize than ask permission” philosophy and it has worked very well for him and the organization.  You probably have a closet of video tapes that are just waiting to go online - so put them online and if people ask you to take it down - take it down.  One example that most of us have probably seen are the Lego movies.  Brewster found this genre of movies fascinating - but he mentions that if it weren’t for the free storage on the archive (pre-YouTube) these movies may never have been so widely spread.  He described this as, we as the library supporting a community that had no home before.  We’re here to put things of shelves and give things away - so why not put things online and give them away?

Television

The Internet Archive only has 1 week of TV available so far - 9/11 - 9/18/2001.  This shows a full picture of what people were watching during that horrible week.  (update: I may have misunderstood - as I view the archive site I see more than just this….)

Apparently there is someone in North Carolina out there recording TV non stop on 20 channels in DVD quality.  Apparently it costs him about $15 per video hour to digitize and has over 50,000 videos in his archive.  You can’t get just one point of view (need multiple channels) news may say it’s fair and balanced - but it’s not - you don’t just want John Stewart as your archive of news :)

Software

Not much because of licensing issues - it’s doable - just not legal yet.

Text

This is where Brewster see the biggest opportunity for traditional libraries to participate.  We have in our charge the responsibility to distribute print/books. 

We, as librarians, have to work very hard on text. Look at what we did with journals - we handed them to many corporations and now we have to rent them back from them :(  if we had never let it happen in the first place we wouldn’t be wondering how to digitize our journals now.  The same thing is going on with monographs now - we’re handing them over to corporations - we should be doing this ourselves instead and the Internet Archive wants to help.

There are 26 million books in the Library of Congress - one book is about 1MB that’s 26TB in the Library of Congress.  For $60,000 you could have the entire Library of Congress digitized.

Brewster’s goal sounds like a simple one - “one webpage for every book ever published.” What would it take to do this?

First off, we’d have scan a whole heck of a lot of books - and get the catalog data. 

The archive has experimented with a few methods, first they worked with the million book project - they shipped their books to India and they learned not to ship their books to India. Brewster recommends that you have the Indians scan the books they like - but keep your books to yourself.  Instead they found that for 10 cents a page they could scan their own items in house. They came up with the scanner and have a person turn the pages of the book - they tried the robots but they weren’t great (may be better now).  At the University of Toronto this method produces a million pages a month.

So, for the cost of copying a page at Kinkos you can digitize it and add MARC records and share with the world. Most importantly it’s being done by librarians - our of the corporate sphere.  We need to demand the right to give our books away - not have our books owned by corporations who will rent the content to us with exceptions tied to it.

Some quotes from Brewster: “Please help support these scanning centers while they’re up and running … take collections that you’ve got and have them digitized and start building services around them.”  If we’re going to build one web page for every book, we’re going to have to scan a lot of books.  One option of a service you could add is a scan on demand link to your catalog.  Have patrons click this link to have a book scanned - same cost as ILL - might as well scan it and put it on the web for anyone to use.

Then you can provide your digital copies via ILL, Brewster states: “I don’t know what loan means in the digital world - but let’s figure it out!” Why wait for someone else to tell us?

Next, let’s scan all the microfilm.  Someone came up to Brewster after one of his talks and said - “we’ve done this before - it’s called microfilm.” So why not digitize our microfilm as well? For less than 10 cents a page they can do all microfilm.  The Internet Archive is actually doing a large scale microfilm scanning project right now using the Carnegie model.  Apparently Carnegie would build your library for you if you promised to stock it with books and materials.  So the The Kahle/Austin Foundation will donate a microfilm scanner to your organization for X years if you the library will keep it up and running for X hours a week.  This only costs labor and time and no money has to change hands.  In the end we’ve digitized all of our microfilm and made it more accessible.

This made me think of a question - if years ago people said you should microfilm everything and now everyone’s saying you should digitize it - what’s to say that in another 50 years there won’t be another format?  This sounds to me like a never ending loop - but at the same time it sounds like such an obvious progression given the technology we have and the types of users we’re dealing with.

Next, we need better selection - right now we’re just digitizing whatever we’re handed - this means we don’t have full collections.  Because of this the Internet Archive now has 90 sponsor collections - “We need help!”–Brewster asks that we pick an area of cataloged material and share that digitally - think outside of your own library.  For some reason librarians seem to think that they’re only responsible for digital copies of materials they have in their own library - keep digital copies of things from other libraries - why only have digital copies of items you have in print?  You want a full collection on your area of study for your library. This was something I was working on at the Seminary.  I was finding digital copies of materials I thought would be of interest to our students and importing those OCLC records into our catalog.  Just another way to provide access to data.

The next step according to Brewster is to build the catalog and “we finally need to do this FRBR thing - come on guys, it’s not that hard!!!”  Even if the digital copy of the book isn’t available yet, it makes sense to provide pages for the book with catalog data that pulls information from sites like Amazon and other book information sites.


Code4Lib - Day 1
Originally uploaded by nengard

When the books are available, we need to work on our displays.  Many of our displays are lacking.  We need better search functions, open APIs to allow people to re-purpose our data in ways that make sense for them.  We also need to make book images with pages that flip, provide the ability to zoom in and printable.  In fact the Internet Archive offers a service where people can print books out from their service in real paperback looking formats.


Code4Lib - Day 1
Originally uploaded by nengard

Another option is to use the One Laptop per Child as an ebook reader.  The kindle handles ASCII formats okay - but not the types of images that we’re creating for our digital collections.

Conclusions

We have to work together on building this!  We can’t just check back in a year and see what’s happening - instead of waiting for others to do the work - why not contribute? We want to be able to build some great services that will allow people to bulk download these materials and re-purpose them if they want.

One way is to join the Open Content Alliance - there are over 80 libraries now. It’s free to join, you just have to contribute. 

The next step is to get service layers in place - this is where the code4libers come in. We have the skills to make the Internet Archive even more accessible and valuable.

Questions & Answers

Dan Chudnov asked what he called “tough questions” - now that some companies like Reed Elsevier are trying to change their business models from journal sales to other routes, is there an opportunity to go and buy up their journal services so we get our data back?   

Brewster’s answer: there is a way to do this - some people are trying - until it comes to the point where they aren’t making money any more we’re going to have to keep scanning ourselves

Dan’s other question - is power an issue?

Brewster - power is costly, but not running out any time soon.

Another question: the data is only good as long as the disks are still spinning - how do you make it last for years? 

Brewster: the question is a good one - the real way to have long term preservation is to have access - access drives preservation.  dark archives lead to data being lost.  we have to replace our machines every few years to keep up.  tapes suck! have you ever tried to read them back??? if there are at least 5 copies - 5 organizations then I can sleep

Real Conclusion

“if you’re frustrated enough - please come and help!” — Brewster

What an amazing way to stop!  What an amazing way to start the conference! So many people were completely inspired, I can’t wait to see what comes of this talk - I hope some amazing APIs start popping up!

Technorati Tags: , , ,

eBook Usage Survey

This was posted recently to the ERIL-L discussion list.  I encourage you to participate:

Primary Research Group (www.primaryresearch.com) is planning to publish asurvey of eBook use by academic, public and special libraries. This survey isopen to libraries in all nations.  Participants receive a free PDF copy of theestimated 100-page report. Data is broken out by type and size of institutionfor easier benchmarking.  Participants are listed in an appendix but responsesare not attributed to specific respondents. To take the 45-question survey,follow the link below:

http://snipurl.com/206m6  [www_surveymonkey_com] 

500 Million Firefox Downloads

Yesterday Mozilla announced that Firefox has been downloaded 500 million times!!

Firefox just reached  500,000,000 downloads. This is an absolutely phenomenal milestone for Firefox. It is sort of hard to imagine what that number means. For some perspective, that’s roughly the audience size of 10,000 Rome Colosseums combined. It would be the weight, in kilograms, of 8,500 Boeing 747 airplanes. In dollars, for $500 million you and 15 of your friends can fly to the International Space Station.

To celebrate they’re asking that we help people in need:

OR, you can affect change and invite 15 of your friends to play a game and feed 25,000 people. With your help we can break another milestone today with FreeRice.com –500,000,000 grains of donated rice in one day.  Imagine helping to feed the hungry while picking up some new vocabulary too!

This is great news! Now if more libraries would just make Firefox their default browser on patron stations - imagine how many more downloads Mozilla would be able to report??

Technorati Tags:

Stanford Guidelines for Web Credibility

Stanford has created 10 guidelines for building the credibility of a web site based on three years of research that included over 4,500 people.

Register today for MLA Webinar

Reminder about an MLA Webinar:

this is a reminder to register TODAY for the next MLA Webcast on 2.0 Technologies.
Please excuse cross-postings.

Web 2.0 Principles and Best Practices: Discovering the Participatory Web
MLA's Educational Webcast
Wednesday, March 5, 2008
1:00 - 4:30 p.m.

So what exactly is web 2.0? Is it just a marketing buzzword or something with real application potential that I need to learn more about?

The goal of MLA’s spring webcast is to provide participants with the answer to this question and a basic understanding of web 2.0 terminology and concepts. Join your colleagues in a discussion on the effect of the technology on heath sciences library services, and identify the impact of web 2.0 services on health care today and in the future.

Objectives: 

·        understand web 2.0 terminology and concepts
·        recognize web 2.0 technologies and their possible effects in library and health care services
·        recognize feature differences among available tools
·        assess the efficacy of particular web 2.0 technologies for use in our own environments

 

Location:

Thomas Jefferson University, Herbut Auditorium, College Building - lower level, 1025 Walnut Street Philadelphia, PA 19107    (directly across from the Scott Library)

Cost:  $20 MLA/SLA members; $25 Non-members; $15 Students/Retirees/Between Jobs 

 

You may pay by credit card or check.

Register today at the MLA-Phil website. Click on this link and select the Acteva button. http://www.mlaphil.org/wp/category/ce/

Got Questions? Contact:

Sharon Easterby-Gannett at seg@christianacare.org (302-733-1164)
Ellen Justice at ejustice@christianacare.org (302-733-1179)

 

Directions, Parking and Campus Map: <<http://www.jeffersonhospital.org/patient/article4085.html>>

Take advantage of this national continuing education event and earn 3.5 MLA CE contact hours!

 

This sounds pretty interesting.  Don't miss this opportunity if you're free.

 

Great First Job for Special Library Techies

Jenkins Law Library is looking for a Assistant Network Administrator / PC and Database Support Specialist.

This position is responsible for assisting the Network Administrator, troubleshooting computers (PC desktops and servers) when needed and assisting with Website support. The primary focus is to support all users so they can work optimally in a networked environment.

Learn more at jobs.jenkinslaw.org.

An iSchool's wiki

I just learned about this wiki this week.

The School of Information Studies at Syracuse University (fondly now called the iSchool) has a wiki "for students, faculty, and others interested in the Masters of Science in Library and Information Science program at the School of Information Studies at Syracuse Unversity."  Looking at some of the pages, this site answers questions that occur every year -- if not every semester -- and provides good resources that can be used for learning more about the MSLIS program. 

As you look at the wiki, don't be surprised if you find information that you'll find useful, even though you have no attention of attending SU.

BTW Although there seem to be only a few current authors, I know that students have provided input, which means that they're getting wiki experience in school!  (Yeah!) 

SLA Survey: New Special Librarians or Current Students

This from my inbox:

The Special Libraries Association (http://www.sla.org) plans to enhance its services for information professionals who have graduated in the last five years.

We're conducting a survey to gather input from new graduates and current students. Please help us by telling us what subject areas interest you, how you prefer to receive professional information, and how you see SLA as a partner in your career development.

There are 11 questions. You can respond in under 10 minutes. Click here to get started:
http://www.surveymonkey.com/s.aspx?sm=KSXG_2bpPojuVogVu1G6POrQ_3d_3d

I've already filled it out - took no time at all.

 

Google Apps -- With or Without IT

A colleague pointed me over to this story about Google offering their Apps package without the cooperation of your IT organization (so long as you have admin rights to a server).

Here's the official press release: http://www.google.com/intl/en/press/pressrel/20080207_googleapps_teamedition.html.

Power to the people?

c.

Looking for photos

I am trying to incorporate more graphics into my presentations and into the lectures I give at Syracuse University (which are done totally online).  My secret weapon is no secret at all -- it's Flickr.  I search using the words that describe what I'm looking for, then narrow my search to those photos with a Creative Commons license.

For example:

  • Need to talk about securing your identity or identity theft?  Check out these!
  • Need a photo of an archive?  Here.
  • Need a graphic for a copyright presentation?  Got it!

Jillianna SuiseiAnd keep in mind that people put lots of screen shots in Flickr, including screenshots from Second Life.  So...no matter what you're trying to illustrate, I bet there is a photo in Flickr -- and with a Creative Commons license -- that you can use!  Oh...if the photo doesn't have a CC license, contact the owner to see if they will grant permission.  The answer is usually "yes."

Are you a leader?

Somehow with all of the busyness in my life I missed the announcement about the new PALINET Leadership Network page.  This from Walt Crawford:

Take a look at the PALINET Leadership Network (PLN). It’s an international network to provide resources and share ideas among library leaders (present and future) of all varieties. It’s free, it’s open to anyone who believes they belong, and it’s off to a good start–with just under more than 300 users and some 200,000 words of content already in place.

What I love - is that they're considering out busy lives and allowing us to keep up with updates in more than one way.  As a wiki you can of course "watch" pages and receive updates via RSS or email, but Walt is also helping us out:

We also know that most leaders (and would-be leaders) are busy people, who don’t have time to go check a wiki every week or two to see what’s new. With that in mind, we’ve created PLN Highlights–... You can add PLN Highlights to your aggregator–or, if you’re not a big aggregator person and blog reader, you can sign up to get posts via email.

Read Walt's original post for more information or just poke around at the wiki.

 

SLA Blogging Chair Interviewed on Talking w/ Talis

Just wanted to let you all know that I was interviewed recently on Talking with Talis.  I talked about Open Source, Libraries and the way we're headed in the future.  If that wasn't enough, Michael Stephens adds to the pressure by saying:

I think we should watch developments such as this very closely. The structure and focus of Nicole’s new position may influence and guide future jobs for librarians in consortia, large library systems and our associations. I am very happy to see this move and the press around it.

While it is a lot to live up to - I agree that we need to keep an eye on open source - especially in Special Libraries as we move forward with new technologies in our libraries.

Peer reviewed blogging?

I just read this over at Judith Siess' OPL Plus blog and thought it very interesting:

http://www.researchblogging.org/

From the site: "Research Blogging helps you locate and share academic blog posts about peer-reviewed research. Bloggers use our icon to identify their thoughtful posts about serious research, and those posts are collected here for easy reference."

Imagine web 2.0 with peer-reviewing - I'm thinking a mixed blessing.

c.

Your email address:


Powered by FeedBlitz

Search Blogging Section


May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Blog Contributors

Blog powered by TypePad