mySobek Home   |   Help
Skip Navigation Links.
MISSING BANNER

Architectural Overview

Historical Perspective

Development around the University of Florida Digital Collections web presence began in 2005. Greenstone's Digital Library System was chosen as the metadata storage, retrieval, and search engine for UFDC. Greenstone is an open-source digital library system produced and maintained by the New Zealand Library Project at the University of Waikato. It is promoted by the United Nations to many of our partners in Africa, the Caribbean, and Latin America. Greenstone has two main components, the metadata portion and the display portion. While the metadata and indexing portion is strong, we felt that the display portion did not provide some of the functionalities we required. As a result, we chose to utilize only the metadata portion of Greenstone and to create a thin presentation layer over Greenstone.

Figure 1: Greenstone with a simple presentation layer

Work on the presentation layer began with the first formal release in March of 2006. From this presentation layer, SobekCM was born.

Multi-Tier Architecture

As the needs of the library became more refined, more and more of the logic was migrated from Greenstone into the presentation layer, now christened SobekCM. In time, a true differentiation between the presentation layer of SobekCM and the business logic layer formed, resulting in a classic three-tier architecture with a data layer ( initially Greenstone ), a logic layer, and the presentation layer.

As the size of the library and needs of the library grew even more, it was clear that a complete dependence upon Greenstone to house the data was limiting the ability of SobekCM to grow. Initially, each time a user selected to move to the next page of an item, a query was placed to Greenstone to determine the next image to display. This was the first part of the data dependence to leave; the item viewer within SobekCM became completely METS based. When a request is made to view an item, the metdata file for that item is read and display became completely dependent on the characteristics of that file. Thus, the store of METS files for the digital resources became the second major data source.

As the collections grew further, indexing and searching across the Greenstone collections became very time consuming. A search across the library could easily take up to a minute. As we streamlined Greenstone, by switching to Lucene indexes in the background, we also added a greater dependence on the database and moved all metadata searching (non-full text searching) into full-text indexed metadata search tables in the SQL database. Finally, in 2011, we switched to using Solr/Lucene indexes directly and phased out Greenstone.

A final piece of the data layer puzzle is added when we view the web application cache and remote cache as an additional data source. Data objects are cached locally or remotely depending on different scenarios. This allows for quick retrieval of earlier search results and generally reduces the workload on the web application, resulting in quicker response to the users.

Figure 2: n-Tier architecture with multiple data sources

The power of this architecture is the ability to customize the presentation of the data, while relying on the same data and logic layers. While web users see an HTML representation of the data, we also expose three different presentation modes for consumption by different applications. The application includes a OAI-PMH server which renders the result in compliance with the Open Archives Initiative Protocol for Metadata Harvesting to allow harvesters to discover resources loaded into the library. The application can also serve raw XML and dataset-as-xml for use with internal applications used within the Digital Library Center. This further decouples the workflow applications from the database structure and allows for more ease of redesigns. Finally, we expose JSON ( Java-Simple Object Notation ) for consumption by the SobekPH iPhone mobile application.

Figure 3: Architecture with multiple data sources and presentation modes

New Architecture

With the release of the version 5.0, a new architecture will be used for the overall system. Below is a preliminary diagram.

Figure 4: New architecture with version 5.0

 

This will be accomplished by a set of shared libraries.

Figure 5: Code libraries for version 5.0