The amount of paper documents that need to be digitalized is huge. It is useful to have a system to capture, search and retrieve them on-line in a simple way. In this paper we present an acquisition and information retrieval system based on Zope/Plone, that allows a quick definition of customized data type and an easy management of the storage of digitalized documents. Extending Archetypes, it is possible to obtain the relevant interfaces and dynamic validations that allow multiple users to input such documents in a simple and quick way. In addition, the python client, which has been designed to work on HTTP/HTTPS, automates the acquisition phases and the delivery of the data to the server. Making the storing of data independent from the ZODB (the limit of which is highlighted by our benchmarks) and making it be dependent just on the transactional file-systems and on the Postgresql DBMS, it is possible to support a good scalability even for millions of documents and for hundreds of GigaBytes of images. The architecture is fully compliant with web standards and with its design principles. The approach of this paper is applied to a real case study regarding the acquisition, search and retrieval of millions of paper documents belonging to the Italian Registry of the ".it" ccTLD, managed by IIT-CNR.

EuroPython 2004 , Göteborg, Sweden , 2004

Autori: Marco Andreini,Cristian Lucchesi, Maurizio Martinelli, Giuseppe Vasarelli
Marco Andreini

Giuseppe Vasarelli

Tipo: Articolo in Atti di convegno internazionale con referee
Area di disciplina: Information Technology and Communication Systems