Subscribe to News

The Gutenberg Project

Author : Jbuenol

From TechnologicalWiki

Jump to: navigation, search
Gutenberg Project

Contents

[edit] The Beginning

Project Gutenberg was started by Michael Hart in 1971. Hart, a student at the University of Illinois, obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab. Through friendly operators, he received an account with a virtually unlimited amount of computer time; its value at that time has since been variously estimated at $100,000 or $100,000,000. Hart has said he wanted to "give back" this gift by doing something that could be considered to be of great value. His initial goal was to make the 10,000 most consulted books available to the public at little or no charge, and to do so by the end of the 20th century.

This particular computer was one of the 15 nodes on the computer network that would become the Internet. Hart believed that computers would one day be accessible to the general public and decided to make works of literature available in electronic form for free. He used a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg e-text. He named the project after Johannes Gutenberg, the fifteenth century German printer who propelled the movable type printing press revolution.

By the mid-1990s, Hart was running Project Gutenberg from Illinois Benedictine College. More volunteers had joined the effort. All of the text was entered manually up until 1989 when image scanners and optical character recognition software improved and became more widely available, which made book scanning more feasible. Hart later came to an arrangement with Carnegie Mellon University, which agreed to administer Project Gutenberg's finances. As the volume of e-texts increased, volunteers began to take over the project's day-to-day operations that Hart had run.

Pietro Di Miceli, an Italian volunteer, developed and administered the first Project Gutenberg website and started the development of the Project online Catalog. In his ten years in this role (1994–2004), the Project web pages won a number of awards, often being featured in "best of the Web" listings, and contributing to the project's popularity.

[edit] Gutenberg Philosophy

The premise on which Michael Hart based Project Gutenberg was: anything that can be entered into a computer can be reproduced indefinitely ... what Michael termed "Replicator Technology" The concept of Replicator Technology is simple; once a book or any other item (including pictures, sounds, and even 3-D items can be stored in a computer), then any number of copies can and will be available. Everyone in the world, or even not in this world (given satellite transmission) can have a copy of a book that has been entered into a computer. The format in which the works are put constitutes a statement of principles: the texts are in what Hart calls "Plain Vanilla ASCII" (meaning the low set of the American Standard Code for Information Interchange).This format is the most simple format in which a text can be codify. The reason for this is that 99% of the hardware and software a person is likely to run into can read and search these files. The objetive of the project is that the works are accessible with any software or hardware, ancient or modern, Amstrad, Mac, PC or Unix. Hart sees it as a reaction against commercial pressures to change their software and hardware constantly ... often to find that new configurations impede access to old data.

The Project Gutenberg Philosophy is to make information, books and other materials available to the general public in forms a vast majority of the computers, programs and people can easily read, use, quote, and search.

This has several ramifications:

1. The Project Gutenberg Etexts should cost so little that no one will really care how much they cost. They should be a general size that fits on the standard media of the time ...

2. The Project Gutenberg Etexts should so easily used that no one should ever have to care about how to use, read, quote and search them ...

[edit] The Selection of Project Gutenberg Etexts

There are three portions of the Project Gutenberg Library, basically be described as:

  • Light Literature; such as Alice in Wonderland, Through the Looking-Glass, Peter Pan, Aesop's Fables, etc. This portion is designed to get persons to the computer in the first place, whether the person may be a pre-schooler or a great-grandparent.
  • Heavy Literature; such as the Bible or other religious documents, Shakespeare, Moby Dick, Paradise Lost, etc.
  • References; such as Roget's Thesaurus, almanacs, and a set of encyclopedia, dictionaries, etc.

[edit] Copyright issues

Project Gutenberg is careful to verify the status of its ebooks according to U.S. copyright law. Material is added to the Project Gutenberg archive only after it has received a copyright clearance, and records of these clearances are saved for future reference. Unlike some other digital library projects, Project Gutenberg does not claim new copyright on titles it publishes. Instead, it encourages their free reproduction and distribution.

Most books in the Project Gutenberg collection are distributed as public domain under U.S. copyright law. The licensing included with each ebook puts some restrictions on what can be done with the texts (such as distributing them in modified form, or for commercial purposes) as long as the Project Gutenberg trademark is used. If the header is stripped and the trademark not used, then the public domain texts can be reused without any restrictions.

Data of a Book.

There are also a few copyrighted texts that Project Gutenberg distributes with permission. These are subject to further restrictions as specified by the copyright holder.

[edit] Scope of collection

Growth of Project Gutenberg publications from 1993 until 2008.As of December 2007, Project Gutenberg claimed over 24,000 items in its collection, with an average of over fifty new e-books being added each week. These are primarily works of literature from the Western cultural tradition. In addition to literature such as novels, poetry, short stories and drama, Project Gutenberg also has cookbooks, reference works and issues of periodicals. The Project Gutenberg collection also has a few non-text items such as audio files and music notation files.

Most releases are in English, but there are also significant numbers in many other languages. As of July 2008, the non-English languages most represented are: French, German, Finnish, Dutch, Chinese, and Portuguese.

Whenever possible, Gutenberg releases are available in plain text, mainly using US-ASCII character encoding but frequently extended to ISO-8859-1. Besides being copyright-free, the requirement for a Latin-text version of the release has been a criterion of Michael Hart's since the founding of Project Gutenberg, as he believes this is the format most likely to be readable in the extended future. The text is wrapped at 65-70 characters and paragraphs are separated by a double-line break. Although this makes the release available to anybody with a text-reader, a drawback of this format is the lack of markup and the resulting relatively bland appearance.

Other formats may be released as well when submitted by volunteers. The most common non-ASCII format is HTML, which allows markup and illustrations to be included. Some project members and users have requested more advanced formats, believing them to be much easier to read. But some formats that are not easily editable, such as PDF, are generally not considered to fit in with the goals of Project Gutenberg (although a few have been added to the collection).

Search in Gutenberg

[edit] Searching a Book

[edit] by Autor & Title

The most common feature in Gutenberg, is the capability to search books through the author or the title.

[edit] by Category

The site is ordened like a real library. You can find the books ordened by Bookshelf. e.g. :

  • Music (mp3).
  • Audio Book (mp3,m4b,spx,ogg). Nowadays there are a lot of works realized by users. Exist two subcategories in Audio Books:
  • HumanRead-Audio Books. This Sources are been readed by persons who has used its voice to create them.
  • ComputerGenerated-Audio Books. This Sources are been created through a voice modulator. The quality usually is lower than HumanRead-Audio Books.
  • Pictures & Videos.

...

[edit] Browsing Catalog

You have the posibility to search through an alphabetical catalog, being possible to get them entries selecting the first letter of the name.


[edit] Advanced Search

A special search can be done filtering the following fields :

  • Author
  • Title
  • Subject
  • Language
  • Category
  • LoCC
  • Filetype
  • Etext-No.
  • Full Text

[edit] Updates & The Best Books

You can also know what new Books have been added to the collection. There are two ways to do it:

  • Recent eBooks page. A daily list of the new Books.
  • RSS Reader of new books. A RSS Source is incluyed on Gutenberg Project page for advise you through your web browser (updated nightly). The link is the following one : RSS Source

A top 100 list can be found in the web too. To determine the ranking we count the times each file gets downloaded. Both HTTP and FTP transfers are counted. Multiple downloads from the same IP address on the same day count as one download. This is a good way to know where get a good book.

[edit] Special Services

  • Offline Catalogs: handy ebook Listings to consult offline. Its's possible to downloada copy of the whole web site for offline viewing. Users can also obtain a copy of the catalog via RDF/XML Format.
  • CD and DVD Project. Download entire CDs or DVDs, or have a free disc sent to you. These disks contain books from the site.
  • Digitized Sheet Music. Project Gutenberg volunteers have been engaging in digitizing public domain sheet music, using a variety of techniques, to enable study and performance. For the most part, the musical pieces created have been chamber music, with composers such as Brahms and Beethoven. ClassicalArchives.com worked with Project Gutenberg on our sheet music project. Project Gutenberg also received a donation from an anonymous family foundation to help start the sheet music project.

[edit] Help to the project

1. DVD Project Volunteering. The users can help to the Project Gutenberg distributing its materials to people who do not have high-speed access to the Internet. If you wanted help in this way, you could burn any DVD/CD with your sources and distubite them.

... more information


2. Promote Project Gutenberg. It is important to get the Project Gutenberg name around as much as possible, and you can play an important role, too. If you would like to support our efforts, you can add a button and/or banner to your home page.

... more information


3. Wanted Books. There are some books we cannot yet publish because our paper copy lacks one or more pages. You can help to the project bringing those pages.

list of missing pages

Project contact


4. Editing for the site. The Gutenberg website is now a Wiki so you can volunteer to edit most of the content. You can create an account and write some useful content for the ebook community.

5. Distributed Proofreaders. This site realize new works for the Gutenberg Project. The process (proofreading) is divided in some tasks which are carry out by volunteers.

  • Volunteers are presented with a scanned page image and the corresponding OCR text on a single web page. This allows the text to be easily compared to the image, proofread, and sent back to the site.
  • A second volunteer is then presented with the first volunteer's work and the same page image, verifies and corrects the work as necessary, and submits it back to the site.
  • The book then similarly progresses through two formatting rounds using the same web interface.
  • Once all the pages have completed these steps, a post-processor carefully assembles them into an e-book, optionally makes it available to interested parties for 'smooth reading', and submits it to the Project Gutenberg archive.

... more information

[edit] References

Official Site

Distributed Proofreaders ( Volunteers )

Main Collaborators