Saturday, February 25, 2006

Guide to the ADL Gazetteer Content Standard

Guide to the ADL Gazetteer Content Standard
version 3.2

February 26, 2004



Responsible party:

Alexandria Digital Library Project

University of California, Santa Barbara

Santa Barbara, CA 93106

http://www.alexandria.ucsb.edu


Contents

1. Purpose

2. Overview

3. History of development

4. Core elements

5. Treatment of geospatial description

6. Treatment of temporal description

7. Links to external sources of information

8. Treatment of attribution to source of data

9. Relational database model

10. ADL implementation

11. Availability and contact point

12. Acknowledgements

13. References and links


1. Purpose

The ADL Gazetteer Content Standard (GCS) is designed to be a comprehensive framework for recording descriptions of named geographic places, including the core elements of toponyms (and their history), spatial location (in various representations), and classification (according to referenced typing schemes), and source attribution for pieces of description gathered from various resources for a particular place. The intention is to demonstrate the use of the GCS and promote its adoption and use so that gazetteer data created by various local, national, and international agencies, and by special knowledge groups, can be shared and, when gathered from various sources, understood. The GCS is designed to meet the needs of gazetteers containing current details of named geographic places and the needs of gazetteers containing historical data. It is designed to support international and multilingual applications. It is designed to link to other sources of information about a particular place. As a comprehensive structure for recording gazetteer descriptions, it can be considered to be an “archival” structure. Implementations of it for gazetteer services will include additional tables to support searching and report generation functions.



An underlying purpose is to direct attention to the components of description for named geographic places and to inform future developments of collections, database design, and services that link current and historical toponyms (the names we give to geographic places) to mapable locations (e.g., longitude & latitude coordinates) and that support the answering of queries such as “What schools are in the Tucson area?” because a typing scheme has been used to classify the entries.



A companion to the ADL GCS is the ADL Gazetteer Protocol (http://www.alexandria.ucsb.edu/gazetteer/protocol/) that provides a standard XML-based query and response structure for the machine-to-machine querying of distributed gazetteers. The protocol and an open-source Java-based server implementation are available through the ADL web pages. The protocol and the GCS are independent structures.
2. Overview

The GCS, version 3.2, was developed as an XML schema. From this, a relational database (rdb) logical model has been developed. An implementation has been developed for the PostgreSQL database software, with additional tables to support specific query matching and report generation requirements.



Sections of the GCS deal with

· Names and details of their origin, language, and use

· Classification (typing according to a referenced scheme)

· Codes associated with the place (e.g., FIPS code)

· Spatial location (bounding box and detailed geometries)

· Street address

· Relationships to other named places

· Data (e.g., population, elevation)

· Description (narrative)

· Links to external resources about the feature

· Other: supplemental note; entry metadata



A separate, companion XML schema is used to describe the contributors and their sources for pieces of data included in a gazetteer entry.



Views and files of the GCS include the following:

· HTML graphics of the XML schemas (.html files)

o GCS 3.2 (large file – please wait for it to load completely)

o Source 3.2

· XML schemas (.xsd files)

o GCS 3.2

o Source 3.2

· Sample records (.xml files)

o GCS 3.2 required elements and attributes only

o GCS 3.2 all elements and attributes

o Source 3.2 required elements and attributes only

o Source 3.2 all elements and attributes



Graphics of the relational database model are described below.



Time, attribution to source, and entry date are applicable throughout the GCS to pieces of information gathered from multiple sources about a particular place. Time is treated in a similar fashion to spatial location. Time can be represented as a time range (similar to the bounding box), as detailed time instances and ranges (similar to the spatial geometries), and also as a named time period. The time period of the feature itself (e.g., for a school building that no longer exists) as well as the time periods for names, spatial footprints, data, and classification (e.g., a building changes its use from a church to a school) can all be represented. A general temporal status is part of the time period representation, with current, former, and proposed as the three status values.



Attribution to source and entry date are represented in the XML schema as applicable to sections of the description; e.g., for a particular placename, a particular spatial footprint, a description, etc. In the rdb, this linking of source to data has been extended to most of the attributes in the whole gazetteer entry through the use of mirror tables where the source of each bit of data and its entry date can be represented.



The documentation of the source of pieces of information is structured as a separate XML schema and is integrated into the rdb model as a discrete set of tables with unique IDs for each distinct combination of contributor and the contributor’s source of reference. Linking a particular piece of information to a contributor and source is done with these source IDs.



The core elements (required elements of description) of the GCS are a small subset of the whole GCS. In the XML schema graphic, required elements appear in solid-lined boxes. For the rdb, we have created specific lite schema views of the structure which can be used as a starting point.


3. History of development

The Alexandria Digital Library Project, which started in 1994, created the first ADL Gazetteer early in the project. After a period of use and experimentation, a formal structure was created for gazetteers – the first ADL Gazetteer Content Standard – and the ADL Gazetteer was recreated using a relational database implementation based on the GCS. Revisions to the first GCS have been ongoing as a result of consultations with other potential implementers. In particular, the requirements of historical and multilingual gazetteers were contributed by member of the Electronic Cultural Atlas Initiative (ECAI) at Berkeley. This version (3) is the result of intensive review of the structure during the creation of the rdb logical model.


4. Core elements

A gazetteer record using only the required elements of the GCS might look like the following. Please note that the record is presented here in a report format with customized element labels and without entry dates and attribution to source. The encoded geometry section is presented in XML format to make the point that this section is represented by an externally referenced scheme.



feature ID: 12123434

feature status: current

name: Tucson (county seat)

primary display: true

name status: current

feature class: populated places

primary display: true

classification scheme:

name: ADL Feature Type Thesaurus

version: July 3, 2002

class status: current

spatial location

planet: Earth

bounding box:

geodetic basis: WGS-84

west coordinate: -111.00278

east coordinate: -110,86778

south coordinate: 32.12278

north coordinate: 32.26883

how generated: calculated maximum and minimum extent of detailed geometry

source geometry(ies): primary geometry

geometry(ies):

primary geometry: true

geometry status: current

reference link to external geometry: false

geometry coding scheme:

name: DLESE geospatial.xsd

version: 1

encoded geometry (example only):







Earth



DLESE:WGS84

Information about the projection goes here.

Information about the coordinate system goes here





Polygon

5

Clockwise















some source

Generalized polygon derived from shapefile

+/- 5 mile perimeter

Extra information goes here about the detailed geometry



DLESE:CGD28-CDN

Average sea level

2410

2410

Generalized point elevation for Tucson











entry date: 2000-07-01

modification date: 2001-05-15



In this example, the core gazetteer elements of the feature’s name, classification, and spatial location are represented with some supporting information. This is all that is required by the GCS. The full gazetteer entry for this same place could include multiple placenames and details about each placename; multiple feature classes, possibly from different classification schemes; multiple spatial geometries from different sources or for different time periods; and much more. For any particular gazetteer entry, a selection of the non-required elements can be added.



Please note that some required elements can be treated as defaults; for example, planet = Earth and status = current (if the portion of historical information is minimal).



Also note that the encoded geometry shown above is an example (not complete) to show how an external geospatial description standard can be used to represent the encoded geometries needed for the gazetteer description.



For links to sample minimum and full XML records, click here.

For views of schema and xml files, go to views.

For views of the relational database model, go to section 9.


5. Treatment of geospatial description

Required:

* One detailed geometry (e.g., for a point, box, line, or polygon)
* One bounding box representing the maximum and minimum extent of the detailed geometry(ies)

Optional:

* Additional detailed geometries can be included. These additional geometries may represent the location
o in different ways (e.g., a point, a polygon), or
o come from different sources, or
o represent a change in the extent through time (e.g., for an urban area)

Application:

* To the feature (i.e., to the named geographic feature that is the focus of the gazetteer entry)



The bounding box (aka minimum bounding rectangle) consists of the maximum extent of the feature’s footprint on the Earth’s surface in terms of longitude (east and west) and latitude (north and south). It is required to support basic spatial query matching operations. Separate coordinates for each side of the bounding box (e.g., west coordinate) are used so that there is no confusion when the box extends across the 180º meridian.



The specific elements of description for detailed geometries are not spelled out in the GCS. Instead, the details of the geometries are to be expressed according to a public geospatial representation standard, such as the Geography Markup Language (OpenGIS), the FGDC’s Content Standard for Digital Geospatial Metadata, or ISO’s TC 211 Geography Metadata standard. For the GCS, this is an opaque description to be interpreted by the referenced geospatial coding standard.



The detailed geometry representation can be included in the gazetteer entry or it can be held external to the gazetteer database and referenced through a URL. In either case, the documentation about the format of the representation must be clear enough for correct computer interpretation.



Best practices for detailed geometries are that the following attributes be included:

* geodetic basis (e.g., WGS-84)
* type of geometry (e.g., point, box, line, polygon, multi-polygon)
* set of longitude,latitude coordinate points with documented delimiters
* statement of uncertainty in terms of a plus and minus value (e.g., +/- 5 miles)
* statement of uncertainty as a note



For views of schema and xml files, go to views.

For views of the relational database model, go to section 9.


6. Treatment of temporal description



Required:

* Temporal status: current, former, or proposed

Optional:

* Beginning and ending dates for a general date range that spans the known duration
* Detail date descriptions that can include multiple representations of the associated dates, documentation of the uncertainty of knowledge of the dates, association with named time periods (e.g., the Middle Ages), and notes to explain unusual circumstances.

Application:

* To the feature itself
* To placenames
* To spatial location
* To classification (typing)
* To relationships between named geographic places
* To data associated with a named geographic place



In this version of the GCS, the temporal aspects of a gazetteer entry have been designed to mirror the treatment of the spatial aspects. In both cases, there is a generalized representation (the bounding box and the time range) and detailed representations. Beyond this basic common high-level structure, the treatment of time is distinct because time applies to many aspects of a gazetteer entry and because often the beginning and ending dates are not known, only that the time in question is current or former (e.g., historical) or, to make the set complete, proposed (e.g., a shopping center).



Also, for time there doesn’t seem to be an external standard for the representation of time that covers the needs of the gazetteer. Therefore, a descriptive structure for time representation has been designed for the GCS. It includes the date range as a generalized temporal footprint, the statement of uncertainty for the detailed times, and the association of named time periods.



In anticipation that there will be web-accessible schemes, like gazetteers, that define named time periods in terms of date ranges, the structure for including named time periods allows for linking to an external scheme as the source of the named time period definition.



In the GCS and its associated relational database, the time component is normalized and linked to other components. That is, the treatment of time is consistent wherever it is used in the gazetteer entry.



Best practice is to add whatever dates are known to be associated with the feature or one of its descriptive aspects, even if the dates are not precise (e.g., only expressed to the decade or the century). This information will support some degree of searching and display by date range.



For views of schema and xml files, go to views.

For views of the relational database model, go to section 9.
7. Links to external sources of information

Where there are data sources that supplement the information included in the gazetteer, the GCS provides elements that can be used to link to these external resources. This version of the GCS provides the following linking elements (all are optional and repeatable):

* linkNameInfo: link or reference to further information about the name, such as a scholarly document
* geometryReferenceURL: URL reference to a file that contains the coordinate points or other representation of geographic location, such as a grid representation, plus geodetic basis and geometry type (e.g., point, line, polygon, etc.). File needs to be self-explanatory
* featureLink: web address and description of a site that provides information about the feature; such links are given a description, a type/category, a language, and a URL.

8. Treatment of attribution to source of data

A basic tenet of the GCS is that there will be one gazetteer entry for a particular named geographic location. That is, there will not be more than one entry for the same place. Therefore, information about a place that comes from different sources will be merged into a single record. It is important that the source of the different pieces of information be traceable back to a particular contributor and reference source.



Source identification consists of two parts:

* Contributor
o Organization name and address; optionally a contact point and a website URL
* Source reference
o Bibliographic reference for the reference source; e.g., a map, a book, etc.



Each ADL Gazetteer Source entry is uniquely identified with a mnemonic (e.g. “USGS-GNIS-1”) and by a system-assigned ID number. This ID number is associated with individual pieces of data in a gazetteer entry.



In the rdb model, attribution to sources has been implemented through mirror tables. The result is that attribution can be associated with each row in each column of the main tables. This is an expansion from the basic attribution included in the XML schema and provides a comprehensive solution for tracing bits of information back to the contributor and reference source. The mirror tables also include the entry date for each piece of information.



For views of schema and xml files, go to views.

For views of the relational database model, go to section 9.
9. Relational database model

Graphics showing parts of and the whole relational database model

* GCS lite
o All
* GCS full
o Main (full model)
o Feature name
o Feature location (geospatial)
o Core feature attributes (excludes name and location details)
o Date/time
o Source
* Parallel (mirror) tables for source attribution and entry date
o Part 1
o Part 2
* Spreadsheet holding column definitions
* Spreadsheet holding column descriptions



For views of schema and xml files, go to views.
10. ADL implementation

During the summer of 2003, the rdb logical model will be implemented as a DB2 database. Tables needed to support searching and report generation will be added as needed. The existing ADL Gazetteer database will be converted to the new schema and database model and the existing clients and services will be moved to access the new database.


11. Availability and contact points

Links to the schemas and the relational database model are elsewhere in this document.



The primary contact point for further information is Linda Hill, lhill@alexandria.ucsb.edu.


12. Acknowledgements

The development of the ADL Gazetteer Content Standard and the implementation of the ADL Gazetteer and its associated services have been funded primarily by grants from the National Science Foundation through its Digital Library Program. In addition, funds have been provided by NASA, ESRI, and the Digital Library for Earth System Education (DLESE).



The ADL Gazetteer Development Team includes

Jim Frew

Jordan Hastings

Havår Valeur

Linda Hill

Greg Janée

David Valentine



Pilar Montes developed the relational database model on a contracting basis with ADL.



Many have contributed to the design and contents of the GCS through their feedback to early versions. In particular, the Electronic Cultural Atlas Initiative (ECAI) at Berkeley has given valuable advice in regard to support for historical feature descriptions and multilingual text; Susan Stone has critiqued the relational database model for us and given us valuable feedback.


13. References and links

ADL Gazetteer Development web page: http://www.alexandria.ucsb.edu/gazetteer/



ADL Gazetteer Protocol: http://www.alexandria.ucsb.edu/gazetteer/protocol/



ADL Gazetteer publications: http://www.alexandria.ucsb.edu/gazetteer/#pubs



GCS schema



Relational database model

No comments: