DMR Upgrade Project 2013
This page will be kept around for posterity. Please see the new DMR Post Upgrade 2013 wiki article for more information about loose ends that are being tied up.
The DMR Upgrade Project 2013 is an initiative by LITS to upgrade the Digital Media Repository's underlying CONTENTdm system from Server Version 220.127.116.11 to 6.3 and from Website Version 4/5/? to version 6.3.
Because of the necessary work involved in carrying forward many major customizations (many of which have been folded into the official release), there is significant need for restraint during this and future upgrades to prevent "over-customizing", or implementing such customizations that will impede or even prevent future upgrades.
In the newest version of CONTENTdm, a set of features has been provided to allow a large number of customizations to be implemented and easily carried over into future versions, with minimal work required to re-implement them. We will be taking full advantage of these features during this upgrade project.
The ETA for this project will be the week of May 6 to May 10, 2013 (the interim period between Spring and Summer Semesters). The current implementation plan is to set up a fully functional duplicate LIBX server and simply "swap it out" with the production LIBX server.
This upgrade process will require several required features to be re-implemented in a sustainable way (one that will provide the least amount of resistance during future upgrades) and, of course, may affect other systems that rely on the Digital Media Repository (via API, Harvest Points, or otherwise). This article will document all of these features and systems, each of which will be annotated with technical/developer notes regarding their implementation, for future reference.
- 1 Nomenclature and Commenting
- 2 Features/Customizations
- 2.1 Class A (Required for Upgrade)
- 2.1.1 Protected Collections (Implemented on New Server)
- 2.1.2 Protected High Resolution Images (Implemented on New Server)
- 2.1.3 Linking (Implemented on New Server)
- 2.1.4 CardCat Linking (Implemented on New Server)
- 2.1.5 Sub-collections and Custom Search Boxes (Implemented on New Server)
- 2.1.6 Embedded Media (Implemented on New Server except "content_main" issue)
- 2.1.7 Page Analytics (Implemented on New Server)
- 2.1.8 API Access
- 2.1.9 General Usability Concerns (Implemented #1 and #3 on New Server)
- 2.1.10 Browse by A-Z, Subject, Location, Format, and Contributors (Implemented on New Server)
- 2.1.11 Migrate and Combine All Collections from the DMR and BoT Servers (Implemented)
- 2.2 Class B (Important but will not prevent upgrade)
- 2.3 Class C (Will NOT be implemented)
- 2.1 Class A (Required for Upgrade)
- 3 Summon/OneSearch
- 4 WorldCat Sync
Nomenclature and Commenting
There are currently three servers running CONTENTdm in the library (soon to be four). To prevent confusion, these will be referred to in this document as follows:
|DMR Server||LIBX||http://libx.bsu.edu/||Our live DMR server|
|New Server||LIBCDM||http://libcdm.dhcp.bsu.edu/ (out of date)||Copy of DMR and BoT servers with newest version of CONTENTdm plus customizations.|
|BoT Server||LIBCDM2||http://libcdm2.bsu.edu/||Board of Trustees Minutes Repository using newest version of CONTENTdm.|
|Test Server||LIBCDMTEST||http://libcdmtest.dhcp.bsu.edu/||Test server for the newest version of CONTENTdm. (Available on-campus only.)|
For future reference, any code modifications will be tracked on the DMR Upgrade Project 2013 Code Changes page.
Class A (Required for Upgrade)
Protected Collections (Implemented on New Server)
User based access control for specific collections.
Currently on the DMR Server this feature is used in three specific ways: The Architecture Images Collection, BSUBrowse and BSUSearch.
- The Architecture Images Collection
- When a user attempts to browse or search this collection, they are prompted with an End User Copyright Agreement and login form. The login form authenticates the user based on AD credentials and, once authenticated, allows the user to browse and/or search the collection.
- There are some minor security issues with this method. While the EUCA page successfully prevents users from seeing any metadata (including thumbnails) associated with the collection in most circumstances, if a user performs a repository-level search (searching across 2 or more collections), it is indeed possible for results from this collection to appear. While users must still enter their credentials to view any specific items returned this way, the thumbnails of these copyrighted images, as well as some of their associated metadata, do appear in the results list. Example results for searching two collections for the word "a" in any field
- BSUBrowse and BSUSearch
- By default, unpublished collections are not accessible in any way to anonymous users. In order to browse or search these collections, users must first authenticate with their BSU credentials via the two links at the bottom of the DMR homepage labeled BSUBrowse and BSUSearch. Both links authenticate through the server (there is no login form), but redirect the user to either the browse or search pages. Once authenticated, there is custom code that allows the user to see unpublished collections.
While the existing solutions work fine, the code behind them is messy and the login methods are not consistent (or necessarily secure). A new method that provides a single login form and is built on top of the existing CONTENTdm authentication code has been found that can easily and more reliably replace the two methods listed above.
The out-of-the-box login feature of CONTENTdm 6.3 was designed specifically for administrative users and users who have permission to submit content. It was not designed as a way for end-users to log into the system or to protect anything except unpublished collections. This default functionality would work well for our internal users (such as MADI staff).
The proposed solution (which has been implemented on the Test Server) is to modify the default login page to accept either CONTENTdm credentials (and thus, those domain users who have been specifically added to CONTENTdm as administrative users) or BSU credentials via LDAP authentication.
The solution first checks to see if the user is registered with CONTENTdm and if so proceeds as normal. If not, it then checks to see if the user's credentials work via LDAP and if so, sets the user at a separate level of authentication that allows them to see certain, hard-coded, collections that would normally be hidden, but completely dissociates the user from any administrative privileges.
The only minor drawback to this solution is that it completely hides protected collections from anonymous users, meaning they won't know of the existence of these collections unless they log in. This problem can be easily solved, however, by providing helpful tips on the customized home page (and other pages next to the "log in" button) that would inform the user of the benefits of logging in. We could also highlight these protected collections on the homepage with a clear message stating that users must log in before they can view and/or search them.
For unpublished collections, where BSUBrowse and BSUSearch would come into play, because the group of users who need these permissions is small, we can use the out-of-the-box login feature of CONTENTdm to handle these on a case-by-case basis. If a user absolutely needs to be able to see unpublished collections, but must not be an administrative user, we could simply create a dummy collection, add them as a CONTENTdm user, and assign them minor rights to that dummy collection. This would enable these users to see unpublished collections, but won't give them any administrative permissions, without changing any code.
In order for PHP LDAP functions to work, the appropriate php_ldap.dll file from the same version of PHP CONTENTdm runs on must be used. CONTENTdm is using PHP 5.3.3, which uses the php_ldap.dll from the PHP 5.3.4 VC6 release which can be found at the PHP for Windows website. The php.ini file must be edited to include it as usual. The relevant location is OCLC\CONTENTdm\Content6\common\php533\ext\.
The relevant code for all authentication is in OCLC\CONTENTdm\Content6\website\cdm_common\cdm\controllers\LoginController.php around line 110, the authAction() function, and OCLC\CONTENTdm\Content6\website\cdm_common\cdm\controllers\LogoutController.php around line 106, the end of the indexAction() function.
The code that handles whether or not the "log out" link appears on the page is located in OCLC\CONTENTdm\Content6\website\cdm_common\cdm\layouts\scripts\default.phtml around line 286 at the "if(@$_SESSION['authenticated'])" condition.
The code that controls which collections appear (and thus are available to the user to browse and/or search) is in OCLC\CONTENTdm\Content6\server\docs\dmscripts\DMSystem.php around line 59 (the dmGetCollectionList() function). Unfortunately this file doesn't get any $_SESSION data sent to it. Instead it gets called through a series of other functions that eventually end up using the API code located in OCLC\CONTENTdm\Content6\website\cdm_common\cdm\models\CdmApi.php around line 698 through the api_get_collection_list() function. This function does have access to the session and calls the dmGetCollectionList() function, so we can modify the result it receives to filter out results that shouldn't appear unless a user is logged in as an LDAP user.
To keep consistent with OCLC's file structure, the file containing the list of collections that are only viewable by LDAP users is located at OCLC\CONTENTdm\Content6\server\conf\bsupriv.php.
In summary: to allow BSU users to log in to the system and view collections that are only accessible to BSU users, and to provide this feature through the existing log in tool, the bsupriv.php file must be created and updated with the aliases of those collections, the LoginController.php and LogoutController.php files need to be updated to handle LDAP authentication if standard authentication fails, the default.phtml file needs to be updated to show the "Log out" text even when a user is only authenticated via LDAP, and the CdmApi.php file needs to include the bsupriv.php file and filter out collections from the general public.
Protected High Resolution Images (Implemented on New Server)
Special server-based access restrictions on a collection of images.
Currently on the DMR Server this feature is only used for the Art History Images collection. When a user attempts to search or browse this collection, they are prompted to agree to an End User Copyright Agreement that does not require them to log in. The only purpose of this EUCA is to make the user agree to the terms presented. This information could just as easily be presented at the page level, especially considering the user doesn't have to authenticate.
Items in this collection have a piece of metadata called "Link to Larger Image". This link points to a separate web server that resides on LIBX. The sole purpose of this web server is to authenticate users, determine what authentication group they belong to, and then, if they are in the appropriate group, serve up the appropriate large-scale image.
There is nothing preventing an anonymous user from accessing the collection, regardless of whether or not they agree to the EUCA.
Because the current solution is based solely on server-side components, no code changes are necessary. Instead, a second web server must be set up (whether or not it resides on LIBX is another issue) that prompts users for their BSU credentials and then serves up images based on the authenticated user's permissions.
The only other minor change would be to either modify the code so that the EUCA information appears on each item's page, provide the EUCA on the second web server (so that it won't appear until a user tries to view a larger image), or modify the copyright metadata to reflect it (or simply leave the metadata as it is).
Linking (Implemented on New Server)
Backwards-compatible linking so old links (from 3rd party sites) still function correctly.
All standard links to searches, collections, etc., have been designed to be backwards compatible in the latest version of CONTENTdm. Once the upgrade is complete, if users click an old link to LIBX, it will still work in most circumstances.
However, because of the custom front-end that emulates "sub-collections" (canned searches based on a few particular pieces of metadata), "sub-collection" links will be broken after the upgrade. Here is an example of a collection link that will work after the upgrade. Here is an example of a "sub-collection" link that will not work. As can be seen in the same link modified for the Test Server, the backwards-compatible auto-redirect mechanism still detects the "sub-collection" as a type of landing page, but because, in CONTENTdm, there is no such construct as a sub-collection, it produces an error.
There are a few options depending on how much code we want to change.
The easiest option would be to alter the text of the standard error page to be more helpful, perhaps even providing information as to why the link is broken, along with links to the collection itself, as well as links to those collections that have similar aliases. This is the currently implemented solution.
Another option would be to alter the code used by the "landingpage" view to detect aliases that look like sub collections, and automatically forward the user to the main landing page of that collection (since, as mentioned below, "sub-collection" links will be listed on the main landing page for their collection). Of course, this option has a few drawbacks: users won't be actively encouraged to fix their broken links; there will be some significant code alteration involved; this customization will need to be re-implemented with every upgrade.
The related file for making the change to the error message when a user attempts to view a sub-collection from an outdated link is at OCLC\CONTENTdm\Content6\website\cdm_common\cdm\views\scripts\cdm\error.phtml
CardCat Linking (Implemented on New Server)
Links from CardCat do not use the items individual reference URL. Instead a script, written by Jim Hammonds and Andy West and housed on the DMR server, is called and given the collection alias and the "catkey" value. The script builds a new URL that performs a simple search on that collection for the value in the "catkey" metadata field. This will bring up one result, the item listed in CardCat.
Because these links already exist in CardCat, it would be easiest to re-implement this simple customization in it's exact form (using the same URL and arguments). The old URL will still work (due to backwards compatibility), so minimal change will be required.
In the future (post-upgrade) we may want to update the URLs in CardCat to reflect the universal reference URLs used by CONTENTdm. Jim Hammons and Andy West were the original developers of the cdmlink.php file.
The old cdmlink.php file resides on the DMR Server at CDM6\server\docs\cdmlink.php. Every item that needs it has a catkey metadata field.
Unfortunately the URLRewrite plugin for IIS that CONTENTdm now uses will not allow us to easily just place the cdmlink.php file in the root directory and live happily ever after. (This is because attempting to browse cdmlink.php (or any other file for that matter) will be rewritten using index.php and re-routed accordingly.)
Instead, I'm simply going to add the same customization to \OCLC\CONTENTdm\Content6\website\public_html\index.php to check the REQUEST_URI for "/cdmlink.php" and use the passed GET parameters to redirect the user in the same fashion. Same customization, just without the separate file.
Sub-collections and Custom Search Boxes (Implemented on New Server)
While sub-collections are not a truly self-contained entity in CONTENTdm, and CONTENTdm does not recognize or support the notion of sub-collections, there is definitely a recognized need for a method of providing users with the ability to both browse and search sub sets of various collections.
Sub-collections have been added to the DMR through a major overhaul to the front-end website created by Budi Wibowo. The custom front-end can basically be considered a content management system layer resting on top of the actual CONTENTdm installation. In order to take advantage of the many new features provided by the out-of-the-box front-end, this custom front-end will not be directly carried forward with the new version, but its features and functionality will.
Links to sub-collection landing pages are discussed in the Linking section above.
An example of a collection with sub-collections is the U.S. Civil War Resources for East Central Indiana collection. It includes four sub-collections with their own descriptions and search boxes. However, because sub-collections are not an actual feature of CONTENTdm, they have been implemented by either simply directly linking to the items within the sub-collection and/or providing search and browse functionality across a subset of items in the collection that have been assigned specific pieces of metadata.
For example, the U. S. Civil War Resources from the United States Vice Presidential Museum at the Dan Quayle Center sub-collection isn't really a sub-collection per se, but rather a filtered search within the parent collection that only looks for items containing the string "Dan Quayle Center. United States Vice Presidential Museum" in their metadata.
This collection has been further broken down into "sub-sub-collections", such as the John Cabell Breckinridge Collection, which is really just another filtered search within the parent collection that only looks for items containing the string "John Cabell Breckinridge Collection" in their metadata.
The search boxes in these sub-collections have been pre-formatted to only search the filtered results associated with their sub-collection.
There are also a handful of sub-collections that appear (at least in the custom front-end) as actual collections. For example: Ball State University PolyArk/World Tour Images collection on this page. These collections will continue to be sub-collections, and will simply be highlighted on the new home page and A-Z list as appropriate.
Due to problems with linking (mentioned above), sustainability in future versions of CONTENTdm, and the fact that sub-collections are not an actual supported type of collection in CONTENTdm, sub-collections will be implemented in a drastically different way in the new DMR.
While we can continue to refer to them publicly as "sub-collections" so as to not confuse end-users, developers and content creators should understand that, moving forward, sub-collections are simply canned searches on specific sub sets of items with certain metadata values.
Rather than redeveloping the "content management system" from the current system, we will be providing links, descriptions, and search boxes for all sub collections on their parent collection's landing page. An example can be seen in the U.S. Civil War Resources for East Central Indiana Collection on the Test Server. While not complete, this landing page should serve as an example for all future collections that contain sub-collections. Each sub-collection is given a link that, once clicked, will uncover details, links, and search boxes for the sub-collection and any sub-collections that reside within it.
This change does not require any changes to the underlying code of CONTENTdm, but rather a shift in the paradigm of how we think of "sub-collections". Most, if not all, of the canned searches that already exist in the DMR Server can be directly copied and pasted into landing pages, and the search boxes are very simple to construct and copy.
As for the sub-collections that are seen as actual collections, they will continue to exist as sub-collections, but the new custom home page and A-Z list may display them as actual collections.
I've also added a custom CSS file using the same method as the JS file above for styling collapsing sub-collections.
Embedded Media (Implemented on New Server except "content_main" issue)
Providing in-line audio/video resources via MediaSite.
Several collections have objects which contain audio and/or video files. In the past, CONTENTdm's out-of-the-box features that provide in-line media players for these objects was not used. Instead, a decision was made to place all multimedia in the university's MediaSite system, a central storage server that provides various streaming features.
Items in the DMR as associated with items in MediaSite through an internal metadata field called "Media Identifier", which contains the URL for the item's particular video/audio stream.
MediaSite does not provide direct access to streaming content, therefore the current solution involves loading a web page that contains the appropriate stream, and positioning that page within an HTML iframe so that only the stream itself (and the controls) appear. This iframe is then placed at the top of the item's page so that it appears as though the media is being directly streamed in-line on the web page.
Not all collections use this method. For example, the Indiana ArtsDesk Radio Archive Collection contains links to WMA files hosted on the DMR server. The Dolls Collection contains 3D QuickTime MOV files that are hosted on the DMR server. The Ball State University Student Art Collection contains direct links to MOV files that are hosted on the DMR server. And the Musical Instruments Collection contains compound objects with WMA files that are hosted on the DMR server. Recently, compound objects in some collections have been given the in-line media treatment as well.
In testing, the newest out-of-the-box in-line media player in CONTENTdm simply doesn't meet our needs. A MediaSite stream can't be associated with it and it appears to be inconsistent with its ability to play back video and audio files. It also mistakenly recognizes 3D Object MOV files as video files and attempts to play them, which doesn't work.
The original customization will be carried forward in almost exactly the same form (with some minor improvements). The Media Identifier metadata field will continue to be both an indicator and source for the in-line media that appears on an object's view.
Because there are still some older collections (such as those linked above) that do not use MediaSite, the current plan is for LITS to provide these media files (along with identifying information) to MADI. To be consistent with our current mandate to place all audio/video in MediaSite, MADI will then place these items in MediaSite and add the appropriate Media Identifier to each object.
A special consideration must be made for 3D Object MOV files, as the new version of CONTENTdm recognizes them incorrectly as videos. Code will be modified to prevent the default CONTENTdm player behavior, and replace it with instructional text similar to that found on the DMR Server.
To continue with OCLC's design, a file has been placed on the server under OCLC\CONTENTdm\Content6\server\confg\bsumediasite.php that is similar to the bsupriv.php file. It contains an array called mediasite_collections that contains aliases of all collections that have in-line audio or video enabled from Mediasite.
The relevant code for adding in-line media to single items is in OCLC\CONTENTdm\Content6\website\cdm_common\cdm\views\scripts\cdm\singleitem.phtml around line 232.
There is also a minor change around line 429 that will hide the content_main DIV if an item contains a residual link to a stream (or an outdated URL). This customization has been TEMPORARILY commented out, as it may not be needed in the long run.
The relevant code for adding in-line media to compound objects is in OCLC\CONTENTdm\Content6\website\cdm_common\cdm\views\scripts\cdm\compoundobject.phtml around line 362. It reads from the same bsumediasite.php file mentioned above, but can only show media related to the parent compound object. That is, the in-line media presented is not based on the current page being viewed. Example of current compound object with in-line media.
The relevant code for removing the default CONTENTdm video player (and replacing it with instructional text) is OCLC\CONTENTdm\Content6\website\public_html\ui\cdm\default\collection\default\viewers\videoViewer\loader.php
Page Analytics (Implemented on New Server)
Using Google Analytics to more easily track usage and get page view statistics for each collection.
As "messy" as the code is, it gets the job done and Google Analytics is able to track just about every page view from the DMR. However, Google Analytics itself, while being a very useful and powerful tool, doesn't provide an easy way for us to differentiate hits between different collections. To do that, Alex Lemann created a Python script, currently housed on one of LITS' Linux servers, that grabs Analytics data from a given date range and produces distilled numbers for each individual collection (based on a list of collection aliases). This script must be manually run on a monthly basis.
While LITS has obtained administrative access to the DMR Analytics account, it is still listed under Budi Wibowo's personal account. This isn't a problem, but it makes for inconsistencies and LITS would prefer it if personal accounts were not used to set up these accounts in the future (instead, a general LITS account should be used, and personal access can be set up later).
On that note, it also appears that the Python script is connecting to Analytics with the litstablet at gmail account, while Alex also had access to it through an ablemannbsu at gmail account. In the future, all Google Analytics usage for the DMR will be housed under one general LITS account.
During the upgrade process we will be able to streamline and possibly improve our usage of Google Analytics, as well as make it a sustainable customization that will be easily carried forward with future version of CONTENTdm.
One problem with carrying this change forward is that URLs in the new version look different. For example, the Musical Instruments collection on the DMR Server uses the following URL: http://libx.bsu.edu/cdm4/collection.php?CISOROOT=/MusInst, while the Test Server uses: http://libcdmtest.dhcp.bsu.edu/cdm/landingpage/collection/MusInst. The actual page used (collection.php) is hidden and parameters now use a new "clean" URL. This won't negatively affect statistics, but it will mean that historic data will not directly match new data. It also means that the Python script we are currently using to glean collection-by-collection stats may be affected.
Because of this, the proposed solution for Google Analytics is to:
- Create a new Analytics account for the new version of the DMR under the firstname.lastname@example.org account (under which several other Analytics accounts are housed).
- Modify the monthly Python script to work with the new URLs if necessary OR totally replace it with a web-based script that can be run from anywhere (this would be preferable, as then ASC could run the script for any date range without going through the extra step of requesting it from LITS).
There are several applications that use the search API provided by CONTENTdm. These applications have been developed both internally and by other parties not directly associated with the University Libraries. They will need to be tested and updated as necessary. Below is a list of known applications that use the API and any relevant details and contact info.
- BSU Maps Surface App
- Contact: John Godsey, LITS Student
- BSU Photos Surface App
- Contact: John Godsey, LITS Student
- Museum of Art Surface App
- Contact: John Fillwalk (John Straw and Jim Bradley)
- Primarily uses the David Owsley Museum of Art Collection. This collection has been moved to the Test Server so that the developers of this application can test it and update if needed.
- SecondLife and Blue Mars projects
- Contact: Jim Connolly and John Fillwalk (John Straw and Jim Bradley)
- What Middletown Read?
- Contact: Jim Connolly (John Straw)
- All of the code behind the WMR project was scanned for any reference to "libx". Other than direct standard links and canned searches (which will work due to backwards-compatibility) the WMR project doesn't appear to actually connect to the DMR in any significant way. This means the upgrade will not adversely affect the WMR project at all. This was confirmed on 5/1/2013.
In an email from Jim Connolly (which was also copied to John Fillwalk) sent on 3/24/2013, he said "I don't forsee any problems with the Virtual Middletown project, though I am checking with John Fillwalk to be sure. John Straw is in a better position to answer re: the Second Life work we did. I think it does use some DMR assets. As for the WMR material, just let me know if it has an effect."
In an email from John Fillwalk (which was copied to Ina-Marie Henning) sent on 3/22/2013, he said "We will look into this and get back to you thanks for letting us know."
We're still waiting to hear back from John Fillwalk about any of the others.
General Usability Concerns (Implemented #1 and #3 on New Server)
- 1 Repository- and Collection-level Searching
- There is a built-in difference between performing a search on multiple (or all) collections vs. searching a single collection. Once a user gets to the single-collection level, advanced search features automatically adjust to search only the defined metadata field associated with that collection.
- While this shouldn't be a problem, it was made very obvious and caused some user confusion on the BoT Server (as users who thought they were searching the BoT collection were in fact searching "all" collections, which the system treats as more than one, even if only one collection is published).
- There are several factors that work in our favor that will make this search descrepancy less of an issue. The homepage customization (using the design from Outside Source) will give us complete control over the first search method users encounter. Also, the current DMR Server also has these automated features already.
- Experienced and savvy users who are used to our advanced search page will already be more than accustomed to the differences between the two levels of searching. While average users who aren’t necessarily interested in all of the advanced search features provided to them won’t be confused or overwhelmed by the simple one-box interface.
- To help mitigate confusion, we will be changing the label of the search box so that when a user is not viewing a single collection, it will say "Search all collections". When a user is viewing a single collection, it will say "Search this collection".
- 2 Date-based Searching and Metadata Considerations
- Another usability issue, which was brought to our attention from a user trying to search on the BoT Server, is related to how we set up metadata fields in each collection. It is also related to the repo- and collection-level searching issue mentioned above.
- When a user performs a date search (from any level), CONTENTdm searches the selected collections based on any metadata fields that have been assigned (by us) to those collections as well as mapped to one of the dc.date fields. This has proven to be problematic for users when the "Date-Available" or "Digital Date" field has been mapped to any dc.date field. Most users do not assume that a general date search will return results containing dates related to when an item was entered into the CONTENTdm system. Instead, they are expecting dates related to the original context of the item.
- In order to correct this issue, while still retaining the ability for advanced users to search across the "Digital Date" field if needed, we must unmap the "Digital Date" field from any dublin core Date field. This, of course, could prevent this date from showing up in things such as APIs or Harvest Points, but it would always be available in the DMR as a searchable field.
- The other issue is that, when looking at the defined metadata fields for each collection in the DMR, date fields have been set up as "Text" fields, rather than "Date" fields. While the impact of this is somewhat minor, setting up date fields as "Date" types vastly improves the ability of the system to sort and search those fields. Changing this setting does not appear to adversely affect the collection, and would only require someone to go through each collection and change the type of each date field, then re-index the collection.
- 3 In-Document Searching, Full-Text, and the "Text" tab
- As a result of the Summon/OneSearch project, the current plan is to go through all of our collections and unhide any Full-Text fields as well as possibly map them to some standard Dublin Core field (so that they appear in Harvest Points).
- Unhiding these fields before the upgrade will result in very ugly pages, among other issues, in the current DMR Server.
- Many PDF/compound objects have both an Image or PDF tab above the actual iframe that shows the item, and next to that, a Text tab. The purpose of the Text tab is to show whatever is in the full-text metadata field for that object. By default, however, if the full-text metadata field is hidden (via the administrative interface) the Text tab still shows up, but doesn't display anything.
- By default there is also a standard "Text Search" feature that appears next to the Text tab and allows users to search the full-text of the item they're looking at. This only appears when full-text metadata is not hidden for the collection.
- Full-text metadata also appears under the "Description" area near the bottom of the page if full-text is set as not hidden. This looks messy with items that have a lot of text.
- We would like to still provide the "Text Search" feature, but we don't want the Text tab to appear in any circumstance (as it simply contains jumbled and strangely formatted plain text that may confuse users), and we would like to prevent the full-text from appearing in the "Description" area of the page.
- Preventing it from appearing in the "Description" area is as simple as setting the "Hide full text" in the Website Configuration Tool.
- In summary, to allow in-document searching we must set full-text metadata fields in the administrative interface to not hidden. To hide the newly unhidden full-text metadata field from showing up near the bottom of the page we must change the full-text setting in the Website Configuration Tool. To get rid of the Text tab, we must make some small modifications to the code, detailed below.
Relevant code for hiding the "Text" tab can be found in OCLC\CONTENTdm\Content6\website\cdm_common\cdm\views\scripts\cdm\compoundobject.phtml around line 378. Commenting the li item out works, and as long as full-text is not hidden, the "Text Search" feature will still appear. Another minor modification to the file around line 537 must be made to comment out "tab window 2" to prevent the contents of the now-hidden tab from appearing.
This tab also appears in "single item view" and will need to be edited out accordingly. Relevant code can be found in OCLC\CONTENTdm\Content6\website\cdm_common\cdm\views\scripts\cdm\singleitem.phtml around line 243, with the corresponding tab content div around line 372.
Changing the search box label text can be done by modifying the code in OCLC\CONTENTdm\Content6\website\cdm_common\cdm\layouts\scripts\default.phtml
Browse by A-Z, Subject, Location, Format, and Contributors (Implemented on New Server)
Several navigation links that, when clicked, present the user with a list of collections organized by those fields.
These types of links are not built-in to CONTENTdm. They are also not canned searches that can easily be ported forward. Instead, the way they are handled currently is through the use of our front-end system (which will be going away). The content of the pages is based on the database behind the front-end, and can be edited through the custom Splash Page Automation tool that was created for the custom front-end (to make it work sort of like a mini-CMS).
Because the "mini-CMS" front-end and its accompanying database will be going away with the upgrade, the newly proposed solution to this problem will be based on the Collection of Collections. The Collection of Collections is exactly what its name implies: a CONTENTdm collection of all of our published Collections, with associated metadata.
What we've proposed is adding metadata fields to the Collection of Collections to identify each collection with one or more Subjects, Locations, Formats, etc.... anything we want to be able to browse by. Once these fields are set up, a canned search can be created for the Subject, Location, Format, Contributors links that will take users to a canned search on the Collection of Collections.
Because this will only require built-in CONTENTdm functions, this modification can be easily carried forward without hassle.
The A-Z list will either be a custom page using the API to grab a list of collections and then sort them, or it will use the default homepage that CONTENTdm's website comes with, while the real homepage will be the Outside Source design.
Migrate and Combine All Collections from the DMR and BoT Servers (Implemented)
Ask ASC/MADI (all users) to stop all work on collections during the week of the actual upgrade. Copy all collections from the DMR Server and the BoT collection from the BoT Server as they are to ensure that they are fully updated.
This task can be made much easier through the use of the server/docs/catalog.txt file. This file contains aliases, names, and paths for each collection. Simply copy all collections from the DMR and BoT over, then update the catalog.txt file with the new paths to use. You may need to change the published/unpublished setting of a collection before you see the results in the admin page. This is also a handy way to move collections over with aliases that are too long (like BSU_ArchSlidesCpght).
We will use the DMR Upgrade Project 2013 Collection Status page to facilitate this process.
Some of the collection names do not match between the system name and the front-end name. I'll be working with Mike to make both names match up. This shouldn't cause any problems, but people in MADI who are familiar with the system name will need to find collections through, say, the Project Client, under their new names. Most of the name changes change abbreviations like "BSU" to "Ball State University", among other things.
Class B (Important but will not prevent upgrade)
Outside Source Front-End Design (Implemented on New Server)
We will be implementing a new design created by Outside Source for our main home page that will look like this. We will attempt to carry this look and feel forward into other pages as we deem necessary, and within the sustainable constraints of CONTENTdm's customization features.
An actual HTML implementation of the new design has been saved on our development web server for reference.
Currently these provide additional sorting options and allow the user to persist those settings through multiple searches.
Presently implemented in a primitive form by OCLC with the "favorites" function. The BSU plugin is more powerful. OCLC is promising additional development on this feature. We will submit our suggestions to them and see what they deliver in the next release for this functionality and re-evaluate it.
Spotlight Collections on Homepage (Implemented on New Server)
CONTENTdm 6.x offers a number of new user interface elements that need to be explored as options to replicate this functionality.
Grants page (Implemented on New Server)
A list of all grant funded collections. This can be implemented using the Collection of Collections functionality needed for the specialized browse buttons mentioned above.
Custom Header for David Owsley Museum of Art Collection
The DMR Server has a small customization that displays a different banner when the user is browsing or searching the David Owsley Museum of Art Collection.
The Museum of Art would like to see this customization carried forward. We may modify it a bit to make it fit better with the new design of the web site, however. Jim Bradley has informed them that it may disappear and come back in a different form until we speak with them about some other graphical modifications they were interested in.
This customization appears to be doable through the website configuration tool for that collection, as special headers on a per-collection basis can be specified.
Class C (Will NOT be implemented)
- Hide Header
Currently, the OAI-PMH harvest point that we have been using for the Summon ingest process has been somewhat restrictive. OAI-PMH doesn't handle compound objects well because it doesn't associate child pages with their parent object, and as far as we can tell there is no way to fix this.
We are considering switching to an XML export of each collections, as we are able to export data that keeps parent-child relationships that way (and we can customize the XML data in various other ways).
We are also considering saving time by associating Content Types with keywords within existing delimited strings in our dc.source field. This will require Summon to parse out the field and key in on those certain words when associating an object with a Summon Content Type.
Here is Ashley's (from Serials Solutions) response to these questions:
Regarding these questions, it would be possible to switch to XML updates for the repository. We would need to set up an FTP account with the information and we would likely have to start the repository setup over from scratch, but that is not a problem for us if you want to pursue it. For pulling a content type out of a string of data I posed the question to our metadata team and this was their response:
“In general, Summon can identify specific strings within a delimited list. In the client’s example, we should be able to match “Digital Video” from “Color images; Digital video; Documents; Typed text” and use that to map “Streaming Video.” This sort of mapping could get complicated depending on what’s in the records, but as long as we know which specific string takes precedence in a list of strings (regardless of order), I believe we’ll be able to do what the client asks.”
We have since heard from Summon that they have experienced this "orphan page" issue with other sites and have come up with fixes for them. We are working with our contact at Summon to see if their fix will not only hide orphan pages, but still combine their full-text metadata into the parent object.
One other thing to keep in mind is that the upgrade will change our OAI-PMH harvest point URL. We will need to contact Summon so they can update that on their end to prevent problems with future harvests.
We are planning to do this after the upgrade is complete to prevent Bad Things from happening.
We are currently set up with a main LITS account at the WorldCat Digital Collection Gateway site. This account has also given access to Amanda Hurford and Robert Seaton's individual accounts.
The plan is to go ahead and set up collection metadata on the DMR Server as well as set up metadata mapping and whatever else is needed on the Digital Collection Gateway site so that our records appear in WorldCat. The Board of Trustees collection will need to be set up separately after the upgrade.
Once everything is set up on the DMR Server we simply need to run the WorldCat Sync function in CONTENTdm to write the new OCLC numbers back into our records. After that we can perform the upgrade.
The upgrade will change the URL of our OAI-PMH harvest point. We will need to contact OCLC so they can change our repository's URL on their end (since we can't change it on the administration page). Once that's set up, all of the collections should sync up nicely since the OCLC numbers will already be in the system.