Documentation

Processing the ECHE list data: Erasmus Code

In addition to the original ECHE list data, this script produces processed fields that convey normalized data points to be leveraged by client applications. The processed fields related to the Erasmus Code are described below.

Processed fields are exposed the same way as the original fields.

Unicode normalization

All original values are normalized according to the Unicode standard form NFKC - read more. This is the only change applied to the original data, whereas all other operations described below are applied to a new data point.

Normalizing the Erasmus Code

The Erasmus Code identifier presents a challenge for client applications when used as a unique identifier due to the fact that it includes space characters. Certain software is known for collapsing multiple consecutive space characters (i.e. web browsers, spreadsheet applications) leading to many known issues in real life client applications.

As such, while the original Erasmus Codes are retained, the ECHE List API also provides a normalized version of this identifier which follows a set of rules:

  1. Normalized Erasmus Codes are all uppercase;
  2. Normalized Erasmus Codes always begin with a letter;
  3. Normalized Erasmus Codes always begin with a country component;
  4. The country component has a length of three characters;
  5. The country component may have one, two or three letters;
  6. Any country component with one letter is followed by two spaces;
  7. Any country component with two letters is followed by one space;
  8. The only known country components with three letters are IRL and LUX;
  9. The country component is followed by a city component;
  10. The city component always starts and ends with a letter;
  11. The city component may contain letters and hyphens;
  12. The city component has variable length;
  13. Normalized Erasmus Codes always end with a number component;
  14. The number component has a length of two or three digits;
  15. The number component may have a leading zero to match the above rule.

NOTE: the country and city nomenclature, in this context, is colloquial. Some caveats apply, as described further in this document.

The following regular expression encapsulates all of the above:

^(IRL|LUX|[A-Z]{2}[ ]{1}|[A-Z]{1}[ ]{2})[A-Z][A-Z-]*[A-Z]\d{2,3}$

Known differences between original and normalized Erasmus Codes

This script is mostly focused on normalizing spacing inside the Erasmus Codes, but other changes are applied and may be observed in the API output.

This script will replace any non alphabetical character within the city component with a - character (hyphen).

This script will add a leading zero to any number component with a single digit.

Advice on displaying Erasmus Codes in HTML

Since web browsers usually collapse multiple consecutive spaces, Erasmus Codes may appear differently than how they are present in the actual markup. To avoid this effect, the following CSS rule should be applied to the HTML element containing the Erasmus Code:

selector {
  white-space: pre-wrap;
}

Extracting the Erasmus Code prefix

After an Erasmus Code has been normalized as described above, it is possible to extract the prefix, also referred to as the country component. This prefix consists of one, two or three letters and contains no spaces.

The city component and the number component are also extracted from the normalized Erasmus Code.

These API keys may be useful for grouping entries without the need to process the Erasmus Codes, either original or normalized.

Matching an Erasmus Code to a Country Code

The country component, or prefix of a normalized Erasmus Code is also converted both to an Interinstitutional Style Guide country code and to an ISO 3166-1 alpha-2 country code, according to a known correspondence list.

This API key may be useful for grouping entries without the need to process the Erasmus Codes, either original or normalized.

Further notes on the country and city components

It is not uncommon for the Erasmus Code to contain a city component that is different from the city field, since the components in the Erasmus code are related to its issuance, and not necessarily to the physical or legal address of an institution.

The same holds true for the country component - please refer to 03_COUNTRY.md for more information on this topic.