Implementation of cif format transfer and handling#
This documentation is up-to-date for the alpha developer release. The contents within are subject to change for the full release of QCrBox.
Cif parameter management#
When a cif parameter is required, it is passed to commands API endpoint as JSON data, e.g. {"input_cif":
{"data_file_id": XXXX}} where XXXX is the data file ID for the CIF which has already been uploaded to the data
manager. QCrBox uses the value ofdata_file_id to instantiate a QCrBox.cif_data_file object, which includes methods
to transfer the cif to a specific format or to merge it with another cif (in a container's file system) to create a
unified cif.
The CifDataFileParameter class is essentially an extension of the DataFileParameter class. The difference being that
the DataFileParameter class is intended for generic data files, so does not include any methods for transferring the
cif between formats.
A CifDataFileParameter parameter, and parameters in general, is created inside the application container in the
private QCrBoxClient._prepare_and_launch_command method which, in addition to other objects, returns a list of the
"parsed" parameters for command execution. These parsed parameters, including CifDataFileParameter objects, are kept
to be passed to the two private methods QCrBoxClient._handle_non_interactive_output and
QCrBoxClient._handle_interactive_output which are used to save a command's output to the data manager. These methods
require the parsed parameters because it is the CifDataFileParameter which exposes methods to transfer between
formats.
When is a cif transferred between formats?#
As mentioned earlier, a cif parameter is transferred to a specific format at the start of command execution and merge together with a command's output after a command has executed. Therefore in order for this to happen, the command implementation must return a file path to the cif back to QCrBox (see the command tutorials for more details), e.g.,
def my_qcrbox_command(input_cif: str, output_name: str) -> str | pathlib.Path:
"""An example command.
Parameters
----------
input_cif : str
This is a QCrBox.cif_data_file parameter, which within a container is a
CifDataFileParameter.
output_name : str
This is a QCrBox.output_cif, which within a container is a BuiltinParameter
which contains a string which is the name of the output file.
Returns
-------
pathlib.Path | str
The file path in the container to the latest cif to return to QCrBox.
"""
intermediate_file_path = do_some_stuff(input_cif)
output_path = pathlib.Path(intermediate_file_path).parent / output_name
shutil.move(output_path, output_path)
return output_path
The final merged cif (based on merged the cif at output_path and the input_cif) is based on the cif parameters
defined for the QCrBox.output_cif parameter as described in the Working with cif files in
QCrBox article. If there is no QCrBox.output_cif parameter, then a simple merge is
conducted without any further processing; e.g. the cif files input_cif and output_path are merged together without
modifying any cif entries.
If a QCrBox.cif_data_file parameter has no entries specified in the yaml, as below, transferring is still attempted.
However, the QCrBoxTools functions will fail and a warning message will be logged and the original cif will be sent to
the command. In future releases, if there are no specified entries then a transferring the cif to a new format will
not be attempted.
parameters:
- name: input_cif
dtype: QCrBox.cif_data_file
However, if entry sets are defined in the parameter specification like in the YAML below, the cif will be transferred to the format specified. If this is not possible such as when QCrBoxTools is unable to transfer due to missing contents in the file, then an warning is issued and the original cif will be sent to the command.
parameters:
- name: input_cif
dtype: QCrBox.cif_data_file
required_entry_sets: [ "cell_elements" ]
cif_entry_sets:
- name: "cell_elements"
required: [
"_cell_length_a", "_cell_length_b", "_cell_length_c", "_cell_angle_alpha",
"_cell_angle_beta", "_cell_angle_gamma", "_chemical_formula_sum"
]
Implementation#
Transform and merge are provided by two methods in CifDataFileParameter and a small options model.
class Cif2CifOptions(QCrBoxPydanticBaseModel):
application_yaml: str # path to the application YAML inside the container
command_name: str # command requesting the translation/merge
parameter_name: str # name of the cif parameter or QCrBox.output_cif to use
output_path: str | None # optional path/name for the unified output CIF
- CifDataFileParameter.to_specific_format(input_cif_path, transform_options)
- Lazily calls qcrboxtools'
cif_file_to_specific_by_ymlto attempt a format translation using the supplied YAML and parameter name (the method's second argument is namedtransform_optionsin the code). -
On success writes a
<stem>-converted.cifand copies it over the exported CIF on disk.NoKeywordsErroris logged as a warning and other exceptions are logged as errors; the original CIF is preserved on failure. -
CifDataFileParameter.to_unified_format(new_cif_path, merge_options)
-
Calls qcrboxtools'
cif_file_merge_to_unified_by_ymlto merge the original exported CIF (tracked on the parameter instance) with the command-producednew_cif_path(the method's second argument ismerge_options). -
If
merge_options.output_pathis provided, that filename (with.cifsuffix) is used for the unified output; otherwise a default<stem>.cifname is used. The merged file is copied into place, the parameter's exported path is updated on success and the merged path is returned.
Usage (client/calculation):
get_cif_merge_parameter(command, parsed_parameters)returns(parameter_name, cif_parameter)— the name to use for Cif2Cif and theCifDataFileParameterinstance to merge with.- The client constructs
Cif2CifOptions(application_yaml, command_name, parameter_name[, output_path])and passes it to the calculation asmerge_options(seeQCrBoxClient._handle_non_interactive_output/_handle_interactive_output).output_pathcan be taken from the third return value ofget_cif_merge_parameterwhen present. PythonCallableCalculation.save_output_to_data_manager(when giveninput_cifandmerge_options) will callawait input_cif.to_unified_format(output_file, merge_options)before importing the merged file into the DataManager and creating a dataset. CLI-based calculations currently do not import outputs.
Conversion failures are logged and the system falls back to using the original CIF rather than failing the whole command.