CIFXML Overview.

CIFXML supports the CIF syntax and semantics from The International Union of Crystallography (http://www.iucr.org).

The CIF architecture supports the hierarchy:

CIFXML supports documents and dictionaries conformant to DDL1.

This document describes how the CIF specification is implemented in CIFXML. We have copied the text from the CIF syntax[1] and semantics[2] specification documents and annotate it where appropriate. Text in italics is copied verbatim from the specification, while the rest are our comments. The numbers refer to the sections in the CIF specification. Sections irrelevant to this implementation may be omitted.

[1] http://www.iucr.org/iucr-top/cif/spec/version1.1/cifsyntax.html
[2] http://www.iucr.org/iucr-top/cif/spec/version1.1/cifsemantics.html

Definition of Terms (in [1] and [2])

Syntax (from [1])

Supported and enforced.

Semantics (from [2])

The IUCr provides a list of accepted, possible and reserved semantics. Some of these are difficult to implement unambiguously and some are simply difficult. Note that dictionaries are supported by layering on top of CIFXML and are generally irrelevant here.

Data name semantics

·                 
·                http://www.iucr.org/iucr-top/cif/spec/reserved.html

A reserved prefix, e.g. foo, must be used in the following ways

 
    * If the data file contains items defined in a DDL1 dictionary, the local data names assigned under the reserved prefix must contain it as their first component, e.g. _foo_atom_site_my_item.
    * If the data file contains items defined in a DDL2 dictionary, then the reserved prefix must be
          o the first component of data names in a category defined for local use, e.g. _foo_my_category.my_item
          o the first component following the period character in a data name describing a new item in a category already defined in a public dictionary, e.g. _atom_site.foo_my_item


. Semantics ignored by CIFXML.

Note on handling of units

·                 
·                     Many numeric fields contain data for which the units must be known. Each CIF data item has a default units code which is stated in the CIF Dictionary. If a data item is not stored in the default units, the units code is appended to the data name. For example, the default units for a crystal cell dimension are Angstroms. If it is necessary to include this data item in a CIF with the units of picometres, the data name of _cell_length_a is replaced by _cell_length_a_pm. Only those units defined in the CIF Dictionary are acceptable. The default units, except for the Angstrom, conform to the SI Standard adopted by the IUCr.

This approach is deprecated and has not been supported by any official CIF dictionary published subsequent to version 1.0 of the Core. All data values must be expressed in the single unit assigned in the associated dictionary.

A small number of archived CIFs exist with variant data names as permitted by the above clause. If it is necessary to validate them against versions of the Core dictionary subsequent to version 1.0, the formal compatibility dictionary cif_compat.dic may be used for the purpose. No other use should be made of this dictionary.


Semantics ignored by CIFXML.

Data value semantics

Data typing

·                 
·                    * numb: a value interpretable as a decimal base number and supplied as an integer, a floating-point number or in scientific notation;
·                    * char: a value to be interpreted as character or text data (where the value contains white-space characters, it must be quoted);
·                    * uchar: a value to be interpreted as character or text data but in a case-insensitive manner (i.e. the values FOO and foo are to be taken as identical);
·                    * null: a special data type associated with items for which no definite value may be stored in computer memory. It is the type associated with the special character literal values ? (query mark) and . (full point) which may appear as values for any data item within a data file (see section on "Special generic values" below). It is also the type assigned to items defined in dictionary files which may not occur in data files.


Without a dictionary there is no knowledge of the datatype, so CIFXML preserves the lexical form of the data value after removal of any quotes. It retains exact whitespace including invisible leading and trailing characters within data values (except that line ends are normalised to #10).

Subtyping

·                 
·                     _type_conditions     esd

For example, a value of 34.5(12) means 34.5 with a standard uncertainty of 1.2; it may also be expressed in scientific notation as 3.45E1(12).
CIFXML supports these semantics and will return values with our without the SUs.

Embedded data semantics

CIF conventions for special characters in text

Handling of long lines

·         26. The restriction in line length within CIF requires techniques to handle without semantic loss the content of lines of text exceeding the limit (2048 characters in this revision, 80 characters in the initial CIF specification). The line folding protocol defined here provides a general mechanism for wrapping lines of text within CIFs to any extent within the overall line length limit. A specific application where this would be useful is the conversion of lines longer than 80 characters to the CIF 1.0 limit. This 80-character limit is used in the examples below for illustrative purposes.

These techniques are applied only to the contents of text fields and to comments.

In order to permit such folding a special semantics is defined for use of the backslash. It is important to understand that this does not change the syntax of CIF 1.0. All existing CIFs conforming to the CIF 1.0 specification can be viewed as having exactly the same semantics as they now have. Use of these transformational semantics is optional, but recommended.

[...further discussion omitted...] CIFXML currently does not support this convention

Dictionary compliance

·                 
·                    _audit_conform_dict_name
·                    _audit_conform_dict_version
·                    _audit_conform_dict_location

corresponding to DDL1 dictionaries, or

 
    _audit_conform.dict_name
    _audit_conform.dict_version
    _audit_conform.dict_location

for DDL2 dictionaries. Where no such information is provided, it may be assumed that the file should conform against the core CIF dictionary.
CIFXML does not process these semantics.

CIF markup conventions

CIFXML can be extended to support most of this if required.