User Help Search Engine Optimization
Search Engine OptimizationIndexing SobekCMWe are currently allowing search and indexing robots to index all pages through the web application. However, this identification algorithm is applied to each incoming request to the web application. If the incoming request is identified as a search engine indexing robot, then the request is treated quite differently. When a robot requests a single item, a small static html fragment page for that item is read and displayed through the web application. This allows the URL to remain the same, but results in a very quick execution time ( usually between one and three milliseconds of application time ). In addition, these pages make the full text of the item available for indexing purposes as well as the full citation. ( example | book | static html fragment ) As can be seen in the example previously, the HTML served for robots and the web page HTML served for human requestors appears very similar at the top, although the search engine robot page includes much more indexable data below the primary citation. When a robot requests the browse of a single collection, a simple list of titles and URLs are provided to the robot to allow indexing and following to the individual resources. ( example ) Robots cannot perform searches against the library or against individual items within the library. Permanent URLs and URL RewriteDue to the way that URL rewrite is implemented in this library, the main URL displayed when viewing any item is actually the PURL as well. This PURL is used for referencing the item both internally and externally by users. In addition the PURL is used by search engines. This approach has the advantage of pushing more traffic directly to the item while allowing for simpler URLs and without requiring users to look for the PURL within the citation to reference the resource correctly. This approach also has the advantage of not requiring forwarding to occur for any users (whether human or robot). Link AdvertisementSeveral approaches are taken to advertise the links to prospective search engine indexers. On each item aggregation home page, there is a link which provides a list of all items within the aggregation. This link is identical to the standard ALL ITEMS browse link. Just as this aids in human discovery of the resources within an aggregation, it also works for robot discovery. Two RSS feeds are provided for each aggregation within this library. ( view rss feeds ) One RSS feed generally lists the last twenty items added to the aggregations. An additional RSS feed (particularly useful for indexing) lists every item within the aggregation. In addition, a particularly unwieldy RSS feed is provided with links to every item within this library. Site maps are also generated to list all the aggregation home pages and links to each individual item within the digital library. To keep the site maps somewhat small in size, thirty thousand links are provided in each sitemap, resulting in ten sitemaps for the resources alone. The date that the item was last modified is included in the sitemap to make incremental updates simpler for the indexing robots. An additional sitemap is provided for the collection home pages and static web content pages. While the sitemaps are registered individually with several of the major search engines, they are also included within the robots.txt page for this site. Not all search engines implement this option but it is increasingly used by the major search sites and most robot.txt readers are prepared to skip unrecognized instructions. Cache ManagementTo manage the cache and robot behavior, the following rules are now in place:
Usage StatisticsIn addition, hits from indexing robots are carefully excluded from the overall usage statistics of this digital library, using the same identification algorithm as the web application. In general, web site managers can expect search engine robots to far outpace the number of hits from real users. For example, below are the number of robotic hits and real users for the last several months.
|