Consulting Database Specialist
OCLC Quality Control Section
Duplicate Detection and Resolution (DDR) software
OCLC thanks everyone who adds new records to WorldCat. OCLC has spent the past several years working on a re-implementation of its Duplicate Detection and Resolution (DDR) software in the Connexion environment and to expand its capabilities to deal with all types of bibliographic records. Between May 2009 and January 2010, OCLC ran small subsets of WorldCat against the live database in order to fine tune its algorithms, examining each resulting merge and learning from both the successes and the failures.
The new DDR software is now in full operation. DDR began running through the full WorldCat database (beginning with OCLC #1) on Feb. 2, 2010. In addition, a separate process that examines selected new records and replaced records from a day's journal files began running Jan. 26, 2010. As of the end of June 2010, 2,919,942 duplicate records have been removed out of 67,179,212 records processed.
DDR processing will continue for a number of months. As a result, you will notice fewer duplicates, particularly for printed music, sound recordings, and audiovisual materials since the original DDR software only dealt with records for books.
Like all automated processes, this new DDR will make occasional errors in spite of our best efforts to minimize such cases. Thank you for reporting erroneous merges. OCLC staff will examine the records in question, reverse any merge deemed to have been inappropriate, and try to assure that such incorrect merges do not occur again. One or more of the records may be edited so that our algorithms are better able to identify important differences.
additionally, we will determine if we can learn something more general from such instances and further refine our algorithms to reduce such errors in the future.
We urge users to help us make this process more effective by re-searching your title immediately before entering a new record into WorldCat, especially if the record was previously in an Online or Local Save file. Duplicates reduce the efficiency of the database, so please always verify that a record has not already been added by another library.
For information on When to input a new record please see Bibliographic formats and standards Ch. 4.
For more information about DDR.
Duplicates and reports on possible erroneous merges can be reported via email, or by using the Action Menu--Report Error function while viewing a bibliographic record in Connexion. This function opens a window which is free-text, allows users to have a copy sent to their own email address,
and includes a snapshot of the record as it appeared when the function was invoked. Additionally there is a webform specifically for reporting duplicates found here under Forms.
Editing capabilities and master records in WorldCat
There are many different types of editing capabilities available to the average user of OCLC's Connexion cataloging service, most of which have been in place for decades. Connexion documentation explains in full the specifics of replacing master records, as well as the type of upgrades libraries can perform based on the cataloging level of their OCLC authorization number. Please see this page for the Client and this page for the Browser.
Minimal-Level upgrade, Database Enrichment, and Enhance capabilities have been in place for many years. In the case of Database Enrichment, the system compares what you have done in terms of adding, editing, or deleting fields using the information that is contained in the chart in section 5.3 of Bibliographic Formats and Standards. If the changes match the chart, they are counted as a Database Enrichment replace. If they do not and the record falls into the categories that can be replaced under the Expert Community Program, the changes are counted as an Expert Community replace.
The Expert Community Program is the newest addition to the range of upgrades available to Connexion users. It allows users with Full-Level cataloging authorizations and higher to make additions and changes to almost all fields in almost all records; no special (or additional) authorization is required. The overriding principle of the Expert Community is: "First, do no harm." Please use the same care in editing an existing master record as you would use in creating a new record. A second overriding principle is: "If in doubt, DON'T." For more information, including the Guidelines for Experts document and previously recorded web sessions go to the Expert Community page.
We hope your institution is using these capabilities to improve and upgrade WorldCat master records to a degree never before available.
If you would like to track your own institution’s statistics there is a general report available under OCLC Usage Statistics on the Browser site or for more detailed information on the specific types of changes made (i.e. Database Enrichment, Minimal-level upgrade, etc.) there is the OCLC Product Code Detail Usage Report available via OCLC’s Product Services Web.
Here are instructions for retrieving it:
- On the Product Services Web home page, click on “download records and reports”
- In the “Reports and Statistics” list, click on “OCLC Product Code Detail Usage Report” (about two-thirds of the way down the list)
- At the prompt to log on, enter the authorization and password that you use to access Connexion
- You can either view the report on your screen or download it as a text file
- Expert Community Experiment replaces are identified by Product Code ONT6390
Database Enrichment replaces by Product Code ONT2565
Minimal-level Upgrades by Product Code TOC3491
Enhance by Product Code ONT2571 and
National Enhance by Product Code ONT2570
- If a product code does not appear on your institution’s report then no applicable replaces were done for that category
OCLC Fixed Field and MARC Codes
The supporting documentation for filling out the OCLC Fixed Field in WorldCat bibliographic records can be found in OCLC’s Bibliographic Formats and Standards (BFAS) . Each element (which is actually a 008 field byte) is linked to BFAS. Simply click on the label and you will be taken to the appropriate BFAS page.
Many codes must be retrieved from MARC documentation; BFAS is meant to be used in conjunction with MARC Standards. The OCLC Fixed Field element "Lang" [Language Code] is one such example. Although Connexion Client has pull-down menus to assist users these menus only provide the code. To be sure you are selecting the correct code, consult the MARC Code List for Languages
While most language codes are mnemonic, there are some exceptions which frequently cause trouble for users.
Three of these are:
- Romanian (rum) as opposed to Romany (rom)
- Basque (baq) vs Basa (bas)
- Mandarin (chi) vs. Mandingo (man)
Note that both Cantonese and Mandarin are dialects of Chinese and are coded as 'chi'
Another Fixed Field element which requires consultation of a list is "Ctry" [Country of Publication, etc.] . Connexion Client has a pull-down menu to assist with entering the code, but the definition of each code is in the MARC Code List for Countries. For places of publication within the United States, use a code for the specific state; use a code for the specific province if the place of publication is within Canada. For items published in Australia, use either the three-character codes for Australian states and territories or the two-character code at for Australia. In the three-character codes, the first two characters represent the state or territory and the third character represents the country. Most other countries have two character codes.
Please send any questions or concerns to: firstname.lastname@example.org