Workflows for Mass Digitisation
Thursday, July 17th, 2008Author: Claus Gravenhorst
at Colloquium of Library Information Employees of the V4+ Countries
Accessible information is a basic need of the society or to put it another way … of everyone. Usually the original can only be accessed in printed form or microfilm/microfiche, which means search, use and distribution of the information is time-consuming, cost-intensive and not available for everyone. The digitisation and conversion of printed items into electronic formats were, until recently, complex and cost-intensive. Insufficient budgets and/or resources prevented extensive transformations to digital repositories. Reliable methods for long-term security and the storage of these enormous data sets were virtually unavailable.
As the result of the METAe project (http://meta-e.uibk.ac.at), funded by the European Commission through the 5th Framework Research Program, CCS Content Conversion Specialists GmbH, Germany developed a comprehensive software solution, available on the market since 2003 under the brand name docWORKS. It is a production tool, which offers an integrated workflow for automated, structured conversion of printed documents into digital objects, which describe the physical and logical document structure by consistent use of international XML standards. These XML documents are to be equated concerning quality and structure with born digital documents and can be transferred to digital library systems, portals, document, content and knowledge management systems as well as virtually any media output device.
The main goal achieved through the project was the automatic generation of administrative, descriptive and structural metadata. The advantages of highly structured documents:
As “digital original” they meet the requirements for a digital long-term storage in repositories
With the use of XML open metadata standards, the data can be transformed and migrated to meet current and future requirements (more…)