This site is currently in beta and more functionality and content will be added over the coming months. We welcome your comments. Please click here to provide feedback.

Challenge: Managing and Using Terms and Codes

 Background


Most public sector data, particularly structured data, contains formally defined codes and classification terms, that can be 'looked up' in order to find their meaning, or see how they relate to other codes and terms. These are known under a variety of names and forms: code lists; constrained lists; pick-lists; drop-down menus; constrained value tables; lookup tables; taxonomies; and reference data tables.

 

They are present in almost all databases, and provide the means of storing and presenting information in a structured way that saves space, restricts user input to a constrained list of valid terms, and enables efficient searching.

 

By way of example, shown below is a simple code list for gender:

 

Code    Meaning

1          Male

2          Female

9          Not Specified

 

Other illustrations might include, for example, the names of Government departments, subject themes etc.

 

 

The Challenge

 

By consistent use of open standards for managing and publishing lists, we expect that public open data will be more meaningful, and that lists will be re-used in data created outside of the public sector, so that confident links can be made.

 

Code lists and classifications exist across all Government departments, yet the standards vary considerably, making the linking of data sources difficult. The challenge is to identify an appropriate mechanism that will enable these lists and classifications to be identified, recorded, managed and brought together.

 

What we are looking for?

 

We’d like to know if there are any existing standards or set of standard approaches (including document formats, schemas, representation languages etc) to the interchange and sharing of code list information and what, if any, problems or difficulties there are:

 

What standards should there be for lists?

 

  • provenance metadata (ownership, history, currency, accuracy, timeliness, stewardship, participation, adoption and usage, etc)
  • generation of 'human-readable' forms of individual items and whole collections of terms.
  • generation of ‘machine-readable’ forms - which standard formats should be used?

 

 

How Should Lists be Managed?

 

  • the development lifecycle of lists within a change management system.
  • management of changes, additions, deletions, and retirements, to items in a list
  • versioning of the lists themselves
  • discovery services to permit relevant lists to be found, and alerting mechanisms to inform consumers when standards change
  • community involvement / crowd-sourcing, potentially across national boundaries. 
  • managing competing standards from contributors of common, but different, encoding systems.

 

 

General Comments

 

  • how should the management of licensing be approached, in particular where key codes and terms are not freely distributable under open government or creative commons licensing
  • how should distribution of terms and codes that describe data be managed as part of the release processes for open government data
  • In order to maximise participation we need to recognise the challenges that exist when trying to create data standards, find a way to embrace all standards and the means to link them together efficiently.

 

Status: 

Open
 
 

View existing proposals or Login / Register to create a new proposal