Metadata Searching Database Support
Overview
All metadata searching occurs within these tables in the database.
This allows any metadata updates to be immediately searchable from the SobekCM web application. There are two seperate
table structures here; one supports advanced searching with particular metadata fields selected and faceting while the second provides
extremely quick basic searching capabilities.
To provide extremely quick results, the full-text indexing in SQL 2008 is used.
This allows for a new grammar to be used which provides extremely efficient results with relevancy ranking.
Tables and Relationships
Below is the diagram of the tables which make up this module:
|
Figure 1: Tables and Relationships within the Metadata Searching Module |
Notes on Tables
Below are notes regarding some specific tables within this module:
- SobekCM_Metadata_Unique_Search_Table
This table contains unique metadata value-type pairs. So, a single row in this table might be 'Haiti', with a metadata type that links to 'Subject' in the SobekCM_Metadata_Types table.
Any item that uses that metadata value as a subject would link to this same row. However, if 'Haiti' was used as a TITLE, for example, then a new row would appear with 'Haiti' linking to the metadata
type 'Title'.
The MetadataValue column has a full-text index ( UniqueMetadataCatalog ) and is a unicode-encoded string with the maximum possible length [ nvarchar(max) ]. For some purposes, such as
exact searches or for saving the metadata related to an item, it is necessary to do a more standard match on the metadata, rather than employing the full-text index. However, a standard index cannot be placed
on a column of type nvarchar(max). For this reason, the MetadataValueStart column was added and a standard index placed upon it. This column holds the first 100 characters of the metadata value
and results in a huge performance boon when performing such tasks. For exact matches, this indexed column is first checked, and then the entire value is checked afterwards.
- SobekCM_Metadata_Basic_Search_Table
This table allows for efficient basic searching, where no particular fields are indicated. Each item in the database will have one row in this table
which includes the entire full citation of the item. This full citation is created from the data held in the other metadata search tables by running the SobekCM_Create_Full_Citation_Value
stored procedure which loops through each individual piece of metadata linked to the item and adds it to the full citation.
The FullCitation column has a full-text index ( BasicSearchCatalog ) and is a unicode-encoded string with the maximum possible length [ nvarchar(max) ]. This column is only ever used for metadata searching.
- SobekCM_Search_Stop_Words
This contains all of the stop words used by the MS SQL full-text indexes and allows the SobekCM application to filter these words out
of the query before passing them along to the database stored procedures. This list is routinely updated to reflect any changes in the lists held within the database.
- SobekCM_Item_Aggregation_Metadata_Link
While metadata is actually linked to items, there are many times that the metadata must be pulled by item aggregation, such as facets when browsing
all items or when browsing metadata directly. To increase the speed of this type of action, this table contains links directly between the metadata
and the item aggregation, and includes statistical information about how many times a piece of metadata appears within a collection for a certain metadata
field. This data is updated nightly through the Admin_Update_Cached_Aggregation_Metadata_Links stored procedure. This cached table is
actually only used if the number of items in the aggregation exceeds some threshold, such as 1000 items. Otherwise, the requested metadata values are
determined on the fly for more accuracy.
Last modified: Wednesday, April 22, 2015 mvs