Wrapping a Python module and exposing functionality to run within QCrBox#
This guide walks you through the process of encapsulating a Python module within a QCrBox container. Specifically, we'll focus on a module that queries the Crystallographic Open Database (COD) for structures with similar elements and unit cell parameters. Our goal is to make this module's functionality accessible within a QCrBox container. The resulting container is already present in QCrBox as cod_check. However, we will go through the necessary steps to recreate the functionality:
- Use
qcbcommands to create a new application service from a template, with initial boilerplate configuration files to get us started. - Copy in an example Python module we'll use to conduct COD queries on our CIF file.
- Define an interface using YAML that describes the commands that will expose capabilities of our Python module for use by QCrBox.
- Write some Python glue code that contains functions that are called when our service commands are invoked - essentially one Python function for each command. This glue code will call our Python module to do the actual processing, passing any parameters received from QCrBox.
- Write a Dockerfile that describes how to set up our service container with our application installed.
- Build our new application service container using
qcb. - Test our new container using QCrBox's web management front-end.
If you run into any problems when progressing through this tutorial, there's a troubleshooting guide which may be helpful.
Prerequisites#
Before starting, ensure your development environment is set up following the guide located here, and that you have devbox shell running in your terminal. In addition, it's likely helpful to bring yourself up to speed on the platform's Technical Reference documentation, in particular Architecture & key components.
During this tutorial you will work with Docker, Python, and an understanding of YAML configurations. If you're new to these concepts you can just type in the commands as listed in the tutorial. Alternatively, you can consult additional resources on Docker, Python modules, and YAML for foundational knowledge.
Initial Setup#
The first step is to create a new QCrBox application container to encapsulate our Python application within QCrBox. Follow the instructions within the tutorial to create a new QCrBox container, using the given inputs to the commands as shown in the guide.
Next, download the Python file here and copy simple_cod_module.py into the cod_check_tutorial folder.
Adding the First Command to config_cod_check_tutorial.yaml#
Next, we'll introduce a command that incorporate atomic parameters from the most compatible structure within the COD into our CIF file. This allows us to bypass the structure solution phase if matching information is readily available. If you'd like more detail, refer to the simple_cod_module.py script to understand the functionality we're integrating. We will use the cif_to_search_pars function to generate search parameters and then employ get_number_fitting_cod_entries to find the count of matching structures.
The initial setup, generated by qcb init, has already populated the top section of the config_cod_check_tutorial.yaml file. Our task now is to customize this section with our specific command details. Let's start with the top-level parameters for our first QCrBox command (starting from below commands:):
- Rename
namefrom the placeholder to"merge_closest_cod_entry". - Modify
descriptionas you wish. - Add the module we will import from by replacing the value for
import_path:with"configure_cod_check_tutorial", and the function we want to expose by addingcallable_name: "merge_closest_cod_entry"just below this.
Note that if the Python function name we wish to call is the same as the exposed command name (for example: merge_closest_cod_entry) then callable_name can be omitted and the value of the name entry is used, although we've included it here for completeness even though they're the same.
The command will require four parameters which we specify within the parameters section. Here's what we'll specify:
input_cif(QCrBox.input_cif): Specifies which CIF file to pass to the COD module to use for processing. The type of this argument is special in that is requires the specification of CIF entries. We will add a placeholder to be filled in the next section.output_cif_name(QCrBox.output_cif): Specifies the name of the output CIF file that will be generated. Similarly, this argument also requires specification of CIF entries which will be added. We also add a placeholder for now, to be filled in later.cellpar_deviation_perc(float): Defines the maximum allowable deviation, in percentage, for unit cell parameters between COD structures and our target structure. The default value is set at 2.0%.listed_elements_only(boolean): When set totrue, the search will only include entries containing the exact elements listed in theinput_ciffile. Iffalse, the search will accept entries with additional elements beyond those listed. By default, this is set tofalse.
Here is how you should structure the command in the YAML file (note the [...] - we'll amend these later!):
...
commands:
- name: "merge_closest_cod_entry"
description: "Merge the closest COD entry into the CIF file"
implemented_as: "python_callable"
import_path: "configure_cod_check_tutorial"
callable_name: "merge_closest_cod_entry"
parameters:
- name: "input_cif"
dtype: "QCrBox.cif_data_file"
description: "Path to the input CIF file."
required_entries: [...]
- name: "output_cif_name"
dtype: "QCrBox.output_cif"
description: "Name of the the output CIF file."
required_entries: [...]
- name: "cellpar_deviation_perc"
dtype: "float"
description: "The percentage deviation allowed for the unit cell parameters."
default_value: 2.0
- name: "listed_elements_only"
dtype: "bool"
description: "If True, only the elements listed in the CIF file can be present, otherwise given elements must be present but additional elements are possible."
default_value: false
Description and text input lengths#
The length of descriptions must be limited to 1023 characters, and other text fields are limited to 255 characters. Any text fields or descriptions which are larger than these will cause the application to be rejected during registration with the QCrBox registry.
Specifying Required CIF Entries For Input#
Next, we must specify which CIF entries have to be in the input CIF file for our command to function, and what to add into the output CIF files. If we take a look at the cif_to_search_pars function in the simple_cod_module.py script, we can determine these entries. Next, we need to ensure that required_entries: under the input_cif parameter aligns with the name: and type: sections of that parameter for proper structure:
required_entries: [
"_cell_length_a", "_cell_length_b", "_cell_length_c",
"_cell_angle_alpha", "_cell_angle_beta", "_cell_angle_gamma",
"_chemical_formula_sum"
]
For scenarios where certain CIF entries are beneficial but not mandatory, you could list them under optional_entries. However, in this context, all listed entries are necessary, completing our current modifications to the YAML file. For further information about cif entry handling consult the YAML section from the CIF HowTo.
What if we need to Specify the same required_entries more than once?#
We may need to define another command that uses the exact same entries for input (or output). We might simply copy and paste these entries, however, perhaps we would like to only define the set of entries once. In QCrBox we can do that using cif_entry_sets. For example, at the end of the file we create a new entry set for our commands (replacing the existing unused example one):
cif_entry_sets:
- name: "cell_elements"
required: [
"_cell_length_a", "_cell_length_b", "_cell_length_c", "_cell_angle_alpha",
"_cell_angle_beta", "_cell_angle_gamma", "_chemical_formula_sum"
]
Instead of writing the entries into the individual functions we now replace the required_entries section for input_cif to have our parameter definition look like this:
...
- name: "input_cif"
description: "Path to the input CIF file."
dtype: "QCrBox.cif_data_file"
required_entry_sets: ["cell_elements"]
default_value: None
Defining Cif Output Entries Within the YAML file#
Now, we want to define the output CIF entries for our function. Again for the finer details, consult the yaml section from the CIF HowTo. The general idea is that we keep all values, which are still valid from the original CIF file and only add / substitute new entries. Finally, we also delete all values from the original CIF file, that have been invalidated.
The first two entries are again required_entries and optional_entries and their set counterparts. For CIF output, required values are values that have to be in a CIF file that comes from a successful calculation, whereas optional entries can be in a successful calculation.
Finally, invalidated_entries are entries that are no longer valid with a transformation. In the given example we have changed atom positions and displacement parameters. Accordingly, anything depending on these parameters should not be kept from the original input CIF file. We should delete all derived _geom parameters and all quality indicators (as our diffraction data might be different from the one from the COD). Invalidated entries should be given to match regular expressions as used by the python re module.
The CIF output path parameter should be modified to:
- name: "output_cif_name"
description: "Name of the the output CIF file."
dtype: "QCrBox.output_cif"
required_entries: [
"_atom_site_label", "_atom_site_fract_x", "_atom_site_fract_y", "_atom_site_fract_z",
"_atom_site_occupancy", "_atom_site_U_iso_or_equiv", "_atom_site_type_symbol",
one_of: ["_symmetry_equiv_pos_as_xyz", "_space_group_symop_operation_xyz"]
]
optional_entries: [
"_atom_site_aniso_label", "_atom_site_aniso_U_11", "_atom_site_aniso_U_22",
"_atom_site_aniso_U_33", "_atom_site_aniso_U_12", "_atom_site_aniso_U_13",
"_atom_site_aniso_U_23", "_atom_site_adp_type", "_atom_site_site_symmetry_order",
"_atom_site_calc_flag", "_atom_site_refinement_flags_posn", "_atom_site_refinement_flags_adp"
]
invalidated_entries: [
"_atom_site.*", "_geom.*", ".*refine.*", "_iucr.*", "_shelx.*"
]
Again while the required and optional entries determine what is copied from our evaluation (here the database lookup), the invalidated entries will exclude entries from the input cif. The remaining entries from both files are then merged and output in the location of output_cif_name.
Implementing the Python Glue Code#
Next, we need to implement the module and function we have referenced in the YAML file. To begin, open the configure_cod_check_tutorial.py file. Start by importing necessary functions from the Python base libraries as well as two different modules, by adding the following to the top of the file:
from pathlib import Path
from pyqcrbox import sql_models
from pyqcrbox.registry.client import QCrBoxClient
from simple_cod_module import (
cif_to_search_pars,
download_cod_cif,
get_fitting_cod_entries,
)
YAML_PATH = Path(__file__).parent / "config_cod_check_tutorial.yaml"
We will use the cif_to_search_pars function to generate search parameters and then employ get_number_fitting_cod_entries to find the count of matching structures, which returns a list of dictionaries of COD entries, sorted by the sum of squared differences in the unit cell parameters. Finally, download_cod_cif can be used to download an entry from the COD.
We can now implement our function as follows. Add this below (but before if __name__ == "__main__"):
def merge_closest_cod_entry(input_cif, output_cif_name, cellpar_deviation_perc, listed_elements_only):
cellpar_deviation = float(cellpar_deviation_perc) / 100.0
input_cif = Path(input_cif)
output_cif_path = input_cif.parent / output_cif_name
# get the list of fitting entries
elements, cell_dict = cif_to_search_pars(input_cif)
entry_lst = get_fitting_cod_entries(elements, cell_dict, cellpar_deviation, listed_elements_only)
# if no fitting entries found, raise an error
if len(entry_lst) == 0:
raise ValueError("No fitting entries found")
# download the cif file of the most fitting entry
download_cod_cif(entry_lst[0]["file"], output_cif_path)
return str(output_cif_path)
Registering the Python Function as a QCrBox Command#
To integrate our command with QCrBox, it's necessary to register it within the system. Update the script's concluding section as follows:
if __name__ == "__main__":
application_spec = ApplicationSpec.from_yaml_file(YAML_PATH)
client = QCrBoxClient(application_spec=application_spec)
client.run()
QCrBox recognizes the parameter names from our Python function, using them directly as command parameters within the for the commands exposed by the QCrBox container.
It's necessary to put the client launching code within an if __name__ == "__main__" block at the end, since this only needs to run once when a container is started. Otherwise, each time a command is run, the Python script is imported and if the client invocation code is not guarded then the command execution will either hang or crash the application's container.
Configuring the Dockerfile#
The Dockerfile file is preconfigured with some entries that we won't need. Here's a simplified explanation and modifications required:
-
Base Image Setup: The file begins with specifying the base image for the application. We use the
qcrbox/base-applicationas our starting point, utilizing the latest version available.ARG QCRBOX_DOCKER_TAG FROM qcrbox/base-application:${QCRBOX_DOCKER_TAG} -
Environment Setup: Specifies using
/bin/bashfor running future commands.SHELL ["/bin/bash", "-c"] -
Inclusion of QCrBox settings files: The following lines will copy the
/configure_*.pyand the/config_*.yamlto our container, that we have modified to integrate our program with QCrBox.COPY configure_cod_check_tutorial.py ./ COPY config_cod_check_tutorial.yaml ./ -
Module Inclusion: Ensure our module and its dependencies are included and properly set up in the container. For instance, add the Python module with:
COPY ./simple_cod_module.py ./ -
Dependency Management: Install necessary dependencies, like the
requestsmodule. Choose between usingmicromambafor Conda environments orpipfor Python environments.- For
micromamba:If you have a large number of dependencies, working with a conda .yml file is more sensible.RUN micromamba install -n qcrbox requests --yes - For
pip:RUN pip install requests
- For
-
Delete unnecessary lines: Any other lines in the Dockerfile can be safely deleted.
You should end up with a Dockerfile that looks like the following (assuming the use of micromamba and not pip):
ARG QCRBOX_DOCKER_TAG
FROM qcrbox/base-application:${QCRBOX_DOCKER_TAG}
SHELL ["/bin/bash", "-c"]
COPY configure_cod_check_tutorial.py ./
COPY config_cod_check_tutorial.yaml ./
COPY ./simple_cod_module.py ./
RUN micromamba install -n qcrbox requests --yes
Building the container with the first command exposed#
To create a QCrBox image for our application, we'll execute a specific build command using the application slug defined earlier. Open your terminal and input the following command to start the build process:
qcb down
qcb build cod_check_tutorial
Important Note: By default,
qcb buildwithout additional arguments performs a full rebuild of all dependencies to ensure everything is up-to-date. If you have recently completed a build and wish to save time, you can opt for the--no-build-depsargument. This option focuses solely on building the QCrBox image without updating the dependencies.
After completing the build process, you can launch your newly created QCrBox image with the following command:
qcb up cod_check_tutorial --no-build-deps
This command starts the container without recompiling the image or its dependencies, assuming they were recently built. If you aim to update both dependencies and the image before launching, simply omit the --no-build-deps flag. This ensures that your QCrBox image and all related components are fully up-to-date.
Note that this will only bring up the cod_check_tutorial container and any other containers that it depends on, such as the core QCrBox containers. Any other application containers won't come up.
To check if the container has started correctly we can examine its Docker logs. Use docker ps -a to identify the container name running with the image qcrbox/cod_check_tutorial, and then use docker logs followed by the container name, e.g.
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2befc90b717c qcrbox/cod_check_tutorial:latest "/opt/qcrbox/entrypo…" 4 minutes ago Up 4 minutes (healthy) qcrbox-cod_check_tutorial-1
Here we can see that the container has been created and has been up for 4 minutes, and is reporting a healthy status. We can also check the status of the service directly by examining its logs:
docker logs qcrbox-cod_check_tutorial-1
...
{"event": "Received response to registration request: resp={'response_to': 'qcrbox_rk_0xa167db740e034919bfa10dfc416a7694', 'status': 'success', 'msg': 'Successfully registered cod_check_tutorial 0.0.1 (qcrbox_rk_0xa167db740e034919bfa10dfc416a7694)', 'payload': None}", "extra": {}, "level": "debug", "timestamp": "2025-09-12T10:17:31.568029Z"}
You can also check if an application is registered by using the registry's API directly, by going to http://127.0.0.1:11000/api/applications. You should find the new service in the list, complete with it's YAML interface definition.
Here we can see that the container's request to register with the QCrBox Registry service has been successful, so the container is ready to accept commands from a user workflow.
You may see errors from Pydantic, which ensures the YAML is correctly formatted. If this is the case, check and correct any errors in the config_cod_check_tutorial.yaml file.
Validating our New Container using the QCrBox Frontend#
You can use the QCrBox web management front-end to check your new application container is working, once you have the front-end installed and running (see the installation instructions here).
First, let's check we have our server configured to run a test, and then upload a test CIF file to feed into our new container:
- Log in to the QCrBox front-end.
- Create a new user group containing your QCrBox account if you haven't already:
- Select
Groupsfrom the top navigation bar and+ Create New Group. - Select
Usersfrom the navigation bar, andEditon the line with your user account, ensuring that the new group is selected in theGroupsfield. - Select
Save. - Upload a CIF file to the front-end we'll use to test our new container:
- Download the following CIF file to your local machine.
- Select
Homefrom the navigation bar, then selectBrowse...and the downloaded CIF file, and selectUpload. - Select the uploaded CIF file below
Load Existing File:, and selectLoad.
Now we can run our new service against our uploaded test CIF file:
- Select the
COD Check Tutorial: Merge Closest Cod Entrycommand from the dropdown list on the right (note: do not select the one prefixed withCOD Check, since that's an existing service), and selectSelect Application. - In the parameters choices that appear, you should see the default
Cellpar deviation percthat we specified as2.0, and an uncheckedListed elements onlyoption, so let's keep those values. - Select
Launch Application.
You should now see that the command is being executed.
Hopefully, once complete, you should see an output qcrbox_work_merge_closest_cod_entry.cif file available in the workflow view on the left.
Select the download icon to download the resultant CIF file to your local machine.
If you examine the contents of this output file, you should see three loop entries added to the file,
that correspond to what we expect as we specified in our config_cod_check_tutorial.yaml file.
What if I need to invoke a command line application?#
Instead of calling other Python functions from within the configure*.py script, you can instead invoke a command line program using the in-built and well established Python subprocess module. See this comprehensive tutorial which illustrates how to use subprocess.run() to invoke a command line application, as well as obtain any standard error or standard output from invoking commands.
Conclusion and final remarks#
We have now exposed a single command in QCrbox from a Python module that performs processing on an input CIF file to produce an output CIF.
For more examples you might consider looking into the already implemented programs in services/applications. If this tutorial is unclear at any point please raise an issue on Github with the specific problem that you ran into.