<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"
>

<channel>
	<title>The Digital Blog</title>
	<atom:link href="http://www.godigitalblog.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.godigitalblog.com</link>
	<description>about digital information and content conversion::capture&#62;convert&#62;preserve&#62;present&#62;</description>
	<pubDate>Mon, 28 Jul 2008 10:12:55 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
	<creativeCommons:license>http://creativecommons.org/licenses/by-sa/2.0/de/</creativeCommons:license>
		<item>
		<title>Workflows for Mass Digitisation</title>
		<link>http://www.godigitalblog.com/2008/07/17/workflows-for-mass-digitisation/</link>
		<comments>http://www.godigitalblog.com/2008/07/17/workflows-for-mass-digitisation/#comments</comments>
		<pubDate>Thu, 17 Jul 2008 10:11:51 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[MASS DIGITIZATION]]></category>

		<category><![CDATA[THE VALUE OF DIGITIZATION]]></category>

		<category><![CDATA[British Library]]></category>

		<category><![CDATA[docWORKS/METAe]]></category>

		<category><![CDATA[Finland]]></category>

		<category><![CDATA[METAe]]></category>

		<category><![CDATA[Mets Alto]]></category>

		<category><![CDATA[Norway]]></category>

		<guid isPermaLink="false">http://www.godigitalblog.com/?p=51</guid>
		<description><![CDATA[Author: Claus Gravenhorst
at Colloquium of Library Information Employees of the V4+ Countries
Accessible information is a basic need of the society or to put it another way … of everyone. Usually the original can only be accessed in printed form or microfilm/microfiche, which means search, use and distribution of the information is time-consuming, cost-intensive and not [...]]]></description>
			<content:encoded><![CDATA[<p>Author: Claus Gravenhorst<br />
at <a href="http://colloquium.mzk.cz/index.php" target="_blank">Colloquium of Library Information Employees of the V4+ Countries</a></p>
<p>Accessible information is a basic need of the society or to put it another way … of everyone. Usually the original can only be accessed in printed form or microfilm/microfiche, which means search, use and distribution of the information is time-consuming, cost-intensive and not available for everyone. The digitisation and conversion of printed items into electronic formats were, until recently, complex and cost-intensive. Insufficient budgets and/or resources prevented extensive transformations to digital repositories. Reliable methods for long-term security and the storage of these enormous data sets were virtually unavailable.</p>
<p>As the result of the METAe project (<a href="http://meta-e.uibk.ac.at" target="_blank">http://meta-e.uibk.ac.at</a>), funded by the European Commission through the 5th Framework Research Program, CCS Content Conversion Specialists GmbH, Germany developed a comprehensive software solution, available on the market since 2003 under the brand name docWORKS. It is a production tool, which offers an integrated workflow for automated, structured conversion of printed documents into digital objects, which describe the physical and logical document structure by consistent use of international XML standards. These XML documents are to be equated concerning quality and structure with born digital documents and can be transferred to digital library systems, portals, document, content and knowledge management systems as well as virtually any media output device.<br />
The main goal achieved through the project was the automatic generation of administrative, descriptive and structural metadata.  The advantages of highly structured documents:<br />
As &#8220;digital original&#8221; they meet the requirements for a digital long-term storage in repositories<br />
With the use of XML open metadata standards, the data can be transformed and migrated to meet current and future requirements<span id="more-51"></span><br />
With logical structures search results are improved (chapter-, article-based) and more easily accessed (chapter titles, headlines, pictures with captions, footnotes, etc.)<br />
Continuity between digitally created and digitized documents</p>
<p>The generic, rule based document analysis technology covers a wide range of different document types such as books, journals, newspapers, but also scientific documents like theses, dissertations and reports. The workflow of the conversion process has been automated and simplified to make the digitisation more cost-effective.<br />
The conversion process depends on the document type and can be completed automatically or semi-automatically. Interactive user interfaces are available to monitor the conversion progress, as well as the verification and correction of conversion results. For conversion a rule-based, object-oriented engine is used in connection with text recognition technology (OCR), supplemented by manual and/or semi-automatic interaction capabilities.<br />
The conversion workflow, well integrated in the libraries infrastructure, is document and application dependent and the conditions can be varied. The goal is to make processing as efficient and automated as possible. Based on a unique identifier, linked to the library catalogue, the status of each document is controlled. Already existing metadata will be automatically ingested form the catalogue. If scanning from origin has to be applied, various scanning devices up to automated Scan-Roboters are supported as well. By using client server based processing, the throughput of the digitisation and conversion process can be scaled in such a way, that it meets mass digitisation requirements. A central, server-based conversion combined with automated quality assurance procedures as well as manual quality assurance spread over internal or external resources (near- or off-shore) enables distributed and efficient production workflows.<br />
During the conversion process physical page objects such as text zones, pictures, tables, advertisements, etc. including their characteristics are determined. In addition to logical structures such as chapters, captions, author, article, etc. as well as associated metadata are determined. Text zones are converted to electronic text with integrated OCR technology. The rich standardized XML-based output increases the added value of digitised collections and opens up new dimensions of access and usability. docWORKS supports open metadata standards like METS, Dublin Core, MODS, NISO MIX, ALTO for storage in repositories. The documents coverted by docWORKS are exported in different standard formats. The most important are image (e.g. TIFF, JPEG, JPEG 2000), PDF (alternately with bookmarks and hidden text) and XML, where the international metadata standard METS, hosted by The Library of Congress, is used in first place. Among other things, the &#8220;METS structure map&#8221; defines the logical document structures e.g. chapters and articles. In order to store additional information about the physical layout from document pages, in the context of the METAe research project the ALTO schema was developed, which has been meanwhle chosen by many other digitisation projects worldwide incl. The Library of Congress, adopting ALTO as a standard for the NDNP Project (National Digital Newspaper Program,  HYPERLINK &#8220;http://www.loc.gov/ndnp/&#8221; http://www.loc.gov/ndnp/).<br />
Through XSLT transformation virtually any format can be derived for presentation and distribution purposes.  The highly structured “digital originals” created by docWORKS provide the source for those transformations.<br />
Today docWORKS is in use at in-house digitisation centres at e.g. Harvard University Library, Stanford University Library, The British Library, Royal Danish Library, National Library of Norway, National Library of Finland as well as several commercial service vendors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.godigitalblog.com/2008/07/17/workflows-for-mass-digitisation/feed/</wfw:commentRss>
	<creativeCommons:license>http://creativecommons.org/licenses/by-sa/2.0/de/</creativeCommons:license>
	</item>
		<item>
		<title>Videos of the Treventus ScanRobot book scanner</title>
		<link>http://www.godigitalblog.com/2008/06/26/videos-of-the-treventus-scanrobot-book-scanner/</link>
		<comments>http://www.godigitalblog.com/2008/06/26/videos-of-the-treventus-scanrobot-book-scanner/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 03:52:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Digitalisierung]]></category>

		<guid isPermaLink="false">http://www.godigitalblog.com/2008/06/26/videos-of-the-treventus-scanrobot-book-scanner/</guid>
		<description><![CDATA[Here
]]></description>
			<content:encoded><![CDATA[<p><a href="http://hurstassociates.blogspot.com/2008/06/videos-of-treventus-scanrobot-book.html " target="_blank">Here</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.godigitalblog.com/2008/06/26/videos-of-the-treventus-scanrobot-book-scanner/feed/</wfw:commentRss>
	<creativeCommons:license>http://creativecommons.org/licenses/by-sa/2.0/de/</creativeCommons:license>
	</item>
		<item>
		<title>Computer statt Lesesaal – CCS digitalisiert die British Library</title>
		<link>http://www.godigitalblog.com/2008/06/17/computer-statt-lesesaal-%e2%80%93-ccs-digitalisiert-die-british-library/</link>
		<comments>http://www.godigitalblog.com/2008/06/17/computer-statt-lesesaal-%e2%80%93-ccs-digitalisiert-die-british-library/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 07:45:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[THE VALUE OF DIGITIZATION]]></category>

		<category><![CDATA[British Library]]></category>

		<category><![CDATA[CCS]]></category>

		<category><![CDATA[Digitalisierung]]></category>

		<category><![CDATA[Digitizing]]></category>

		<category><![CDATA[Microsoft]]></category>

		<guid isPermaLink="false">http://www.godigitalblog.com/2008/06/17/computer-statt-lesesaal-%e2%80%93-ccs-digitalisiert-die-british-library/</guid>
		<description><![CDATA[ Quelle: DW-WORLD
Das Wissen der Welt wird immer noch in Büchern aufbewahrt. Die digitale Revolution hat bisher nur einen Bruchteil dessen erfasst, was Autoren durch die Jahrhunderte zu Papier brachten. Ausgerechnet die altehrwürdige British Library in London möchte das nun ändern - und das aus konservatorischen Gründen.
Gerade den ältesten und wertvollsten Büchern droht der Zerfall. [...]]]></description>
			<content:encoded><![CDATA[<p> Quelle: <a href="http://www.dw-world.de/popups/popup_single_mediaplayer/0,,3421464_start_833_end_1133_type_video_struct_3054,00.html?mytitle=Computer%2Bstatt%2BLesesaal%2B%25E2%2580%2593%2BCCS%2Bdigitalisiert%2Bdie%2BBritish%2BLibrary" target="_blank">DW-WORLD</a></p>
<p>Das Wissen der Welt wird immer noch in Büchern aufbewahrt. Die digitale Revolution hat bisher nur einen Bruchteil dessen erfasst, was Autoren durch die Jahrhunderte zu Papier brachten. Ausgerechnet die altehrwürdige British Library in London möchte das nun ändern - und das aus konservatorischen Gründen.</p>
<p>Gerade den ältesten und wertvollsten Büchern droht der Zerfall. Lesen, geschweige denn ausleihen, darf sie darum schon lange niemand mehr. Digitalisierung soll das Problem lösen und den Inhalt der historischen Bücher zudem online verfügbar machen. Die Hamburger Hightech-Firma CCS erhielt den Zuschlag, in den nächsten zwei Jahren unglaubliche 25 Millionen Buchseiten aus den Beständen der British Library digital zu erfassen und sie - in Zusammenarbeit mit US-Softwarekonzern Microsoft – im Internet zu veröffentlichen. Inzwischen läuft im Herzen der Londoner Bibliothek - Tag und Nacht - eine Batterie unterschiedlicher Hochleistungsscanner. MiG-Reporter <strong>Patrick Benning</strong> beobachtete die &#8220;Content Conversion Specialists&#8221; bei Ihrer ebenso filigranen wie spektakulären Arbeit.</p>
<p><a href="http://www.dw-world.de/popups/popup_single_mediaplayer/0,,3421754_start_833_end_1133_type_video_struct_3054,00.html?mytitle=Computer%2Bstatt%2BLesesaal%2B%25E2%2580%2593%2BCCS%2Bdigitalisiert%2Bdie%2BBritish%2BLibrary" target="_blank">Downloadlink DW-TV </a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.godigitalblog.com/2008/06/17/computer-statt-lesesaal-%e2%80%93-ccs-digitalisiert-die-british-library/feed/</wfw:commentRss>
	<creativeCommons:license>http://creativecommons.org/licenses/by-sa/2.0/de/</creativeCommons:license>
	</item>
		<item>
		<title>CCS&#8217;s official statement regarding the shutdown of Microsoft&#8217;s Live Search Project</title>
		<link>http://www.godigitalblog.com/2008/05/28/ccss-statement-to-microsoft-decision-to-end-book-digitizing/</link>
		<comments>http://www.godigitalblog.com/2008/05/28/ccss-statement-to-microsoft-decision-to-end-book-digitizing/#comments</comments>
		<pubDate>Wed, 28 May 2008 14:53:06 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[THE VALUE OF DIGITIZATION]]></category>

		<category><![CDATA[20.000.000 pages]]></category>

		<category><![CDATA[Content Conversion Specialists]]></category>

		<category><![CDATA[digitising]]></category>

		<category><![CDATA[Digitizing]]></category>

		<category><![CDATA[Microsoft]]></category>

		<category><![CDATA[The British Library]]></category>

		<guid isPermaLink="false">http://www.godigitalblog.com/2008/05/28/ccss-statement-to-microsoft-decision-to-end-book-digitizing/</guid>
		<description><![CDATA[May, 25th 2008. Microsoft Inc. (Redmond, USA) announced that they are ending the Live Search Books and Live Search Academic projects.
Satya Nadella, Senior Vice President Search, Portal and Advertising at Microsoft Inc., states:
&#8220;As we wind down Live Search Books, we are reaching out to participating publishers and libraries. We are encouraging libraries to build on [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.godigitalblog.com/wp-admin/post-new.php" target="_blank"></a>May, 25th 2008. Microsoft Inc. (Redmond, USA) announced that they are ending the Live Search Books and Live Search Academic projects.<br />
Satya Nadella, Senior Vice President Search, Portal and Advertising at Microsoft Inc., states:<br />
&#8220;As we wind down Live Search Books, we are reaching out to participating publishers and libraries. We are encouraging libraries to build on the platform we developed with Kirtas, the Internet Archive, CCS, and others to create digital archives available to library users and search engines.  We hope that our investments will help increase the discoverability of all the valuable content that resides in the world of books and scholarly publications.&#8221;</p>
<p>CCS would like to thank Microsoft as co-initiator and patron of this extraordinary digitization project at the British Library for the successful and extremely productive collaboration throughout the last year. We will honor all existing contracts and continue to deliver high quality digitization products with unchanged high ambitions to the British Library.</p>
<p>We believe that Microsoft’s Book Search Project not only helped to digitize a large amount of books but also generated  valuable knowledge to meet the challenges of mass digitization projects to both the library community and the digitization partners. CCS will  support all endeavors to continue these projects that Microsoft has started. CCS expects to contribute to more upcoming mass digitization projects based on both public and commercial funding.</p>
<p>For further information please see:</p>
<p>Microsoft: http://blogs.msdn.com/livesearch/<br />
The British Library: http://www.bl.uk/news/2008/pressrelease20080528.html</p>
]]></content:encoded>
			<wfw:commentRss>http://www.godigitalblog.com/2008/05/28/ccss-statement-to-microsoft-decision-to-end-book-digitizing/feed/</wfw:commentRss>
	<creativeCommons:license>http://creativecommons.org/licenses/by-sa/2.0/de/</creativeCommons:license>
	</item>
		<item>
		<title>Koninklijke Bibliotheek start met digitaliseren acht miljoen pagina’s historische kranten</title>
		<link>http://www.godigitalblog.com/2008/05/26/koninklijke-bibliotheek-start-met-digitaliseren-acht-miljoen-pagina%e2%80%99s-historische-kranten/</link>
		<comments>http://www.godigitalblog.com/2008/05/26/koninklijke-bibliotheek-start-met-digitaliseren-acht-miljoen-pagina%e2%80%99s-historische-kranten/#comments</comments>
		<pubDate>Mon, 26 May 2008 12:47:52 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Digitalisierung]]></category>

		<guid isPermaLink="false">http://www.godigitalblog.com/2008/05/26/koninklijke-bibliotheek-start-met-digitaliseren-acht-miljoen-pagina%e2%80%99s-historische-kranten/</guid>
		<description><![CDATA[von hier
Den Haag, 26 mei 2008 - De Koninklijke Bibliotheek in Den Haag heeft een overeenkomst gesloten met het Duitse bedrijf CCS (Content Conversion Specialists) voor het digitaliseren van acht miljoen historische krantenpagina’s. De gedigitaliseerde kranten zijn doorzoekbaar op ieder woord in de tekst en worden opgenomen in de Databank Digitale Dagbladen, een project van de [...]]]></description>
			<content:encoded><![CDATA[<p>von <a href="http://www.kb.nl/nieuws/2008/8miljoendigitaal.xml " target="_blank">hier</a></p>
<p><em>Den Haag, 26 mei 2008</em> - <strong>De Koninklijke Bibliotheek in Den Haag heeft een overeenkomst gesloten met het Duitse bedrijf CCS (Content Conversion Specialists) voor het digitaliseren van acht miljoen historische krantenpagina’s. De gedigitaliseerde kranten zijn doorzoekbaar op ieder woord in de tekst en worden opgenomen in de Databank Digitale Dagbladen, een project van de Koninklijke Bibliotheek dat gefinancierd wordt door het Nationaal Programma Grootschalige Onderzoeksfaciliteiten.</strong></p>
<p>Voor de uitvoering van het project is CCS een samenwerking aangegaan met het Nederlandse bedrijf M&amp;R uit Kampen. Binnenkort gaan de eerste kranten richting Kampen waar het scannen plaatsvindt. Per maand zullen zo’n 200.000 krantenpagina’s worden gedigitaliseerd. In drie jaar tijd komen alle acht miljoen pagina’s beschikbaar. Begin 2009 worden de eerste resultaten online voor iedereen beschikbaar gesteld.</p>
<p>In Nederland zijn in de afgelopen vier eeuwen meer dan 7000 landelijke, regionale en lokale dagbladtitels verschenen. Dagbladen bevatten informatie over de geschiedenis van de samenleving, politiek, economie, kunst, cultuur en wetenschap. Ze vormen een onmisbare bron voor tal van onderzoekers, van historici tot taaltechnologen die de historische kranten gebruiken voor onderzoek naar de ontwikkeling van het taalgebruik. De krant brengt het nieuws van de dag, maar de informatie heeft eeuwigheidswaarde. Door de kwetsbaarheid van het materiaal (dun en slecht papier) dreigt een belangrijke bron voor wetenschappelijk onderzoek verloren te gaan. Een groot deel van de Nederlandse collectie - afkomstig uit het bezit van zowel de Koninklijke Bibliotheek als van andere erfgoed instellingen - wordt daarom gedigitaliseerd en voor iedereen toegankelijk gemaakt op internet. Een wetenschappelijke adviescommissie adviseert de Koninklijke Bibliotheek over de selectie van de meest belangrijke titels vanaf 1618 - toen de eerste krant in Nederland verscheen - tot aan de twintigste eeuw.</p>
<p>Bij de digitalisering van kranten uit de 20ste eeuw loopt de KB – door de huidige Auteurswet - tegen een aantal beperkingen aan. Hierover voert zij momenteel overleg met het Nederlands Uitgeversverbond en verschillende organisaties die de belangen van freelancers en andere auteursrechthebbenden behartigen.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.godigitalblog.com/2008/05/26/koninklijke-bibliotheek-start-met-digitaliseren-acht-miljoen-pagina%e2%80%99s-historische-kranten/feed/</wfw:commentRss>
	<creativeCommons:license>http://creativecommons.org/licenses/by-sa/2.0/de/</creativeCommons:license>
	</item>
	</channel>
</rss>
