Wrapping a Python module and exposing functionality to run within QCrBox#
This guide walks you through the process of encapsulating a Python module within a QCrBox container. Specifically, we'll focus on a module that queries the Crystallographic Open Database (COD) for structures with similar elements and unit cell parameters. Our goal is to make this module's functionality accessible within a QCrBox container. The resulting container is already present in QCrBox as cod_check
. However, we will go through the necessary steps to recreate the functionality:
- Use
qcb
commands to create a new application service from a template, with initial boilerplate configuration files to get us started. - Copy in an example Python module we'll use to conduct COD queries on our CIF file.
- Define an interface using YAML that describes the commands that will expose capabilities of our Python module for use by QCrBox.
- Write some Python glue code that contains functions that are called when our service commands are invoked - essentially one Python function for each command. This glue code will call our Python module to do the actual processing, passing any parameters received from QCrBox.
- Write a Dockerfile that describes how to set up our service container with our application installed.
- Build our new application service container using
qcb
. - Test our new container using QCrBox's web management front-end.
Prerequisites#
Before starting, ensure your development environment is set up following the guide located here. In addition, it's likely helpful to bring yourself up to speed on the platform's Technical Reference documentation, in particular Architecture & key components.
During this tutorial you will work with Docker, Python, and an understanding of YAML configurations. If you're new to these concepts you can just type in the commands as listed in the tutorial. Alternatively, you can consult additional resources on Docker, Python modules, and YAML for foundational knowledge.
Initial Setup#
To begin, initialize a new QCrBox container for our module:
- Open your terminal and change directory to the directory where you have installed QCrBox.
- Start a Devbox shell using
devbox shell
. - Type
qcb init cod_check_tutorial
and press Enter. - You will be prompted to provide basic information about your application through a guided dialogue. Follow the prompts to complete the setup.
Please provide some basic information about your application.
The following dialog will guide you through the relevant settings.
[1/7] Select application_type
1 - CLI
2 - GUI (Linux)
Choose from [1/2] (1): 1
[2/7] application_slug (cod_check_tutorial):
[3/7] application_name (Cod Check): COD Check Tutorial
[4/7] application_version (x.y.z): 0.0.1
[5/7] description (Brief description of the application.): Can be used to check whether there is a similar structure in the crystallographic open database and output similar structures.
[6/7] url (): https://my.official.module.url
[7/7] email (): module_contact@university.somewhere
Created scaffolding for new application in 'T:\QCrBox_location\services\applications\cod_check_tutorial'.
Understanding the Generated Scaffolding#
Navigate to the application's folder to see the boilerplate files generated by qcb init
. You'll find:
docker-compose.cod_check_tutorial.*.yml
: Docker Compose files, typically unchanged for non-GUI applications.dummy_gui.py
: An example Python script to demonstrate a trivial GUI application. This can also be deleted.Dockerfile
: Contains instructions to build the container.config_cod_check_tutorial.yaml
: This defines the exposed functions as QCrBox "commands". At the moment, it contains example commands for a non-interactive sleep function and an interactive GUI session, but we'll replace these to specify QCrBox commands for our Python code.configure_cod_check_tutorial.py
: Here, we'll implement our module's functionality and register it with QCrBox.
Next, download the Python file here and copy simple_cod_module.py
into the cod_check_tutorial
folder.
Adding the First Command to config_cod_check_tutorial.yaml
#
In the first step of this tutorial, we aim to introduce a command that outputs the number of structures with matching unit cell parameters and elements, as specified in a CIF file, to a JSON file within the work folder. Refer to the simple_cod_module.py
script to understand the functionalities we're integrating. We will use the cif_to_search_pars
function to generate search parameters and then employ get_number_fitting_cod_entries
to find the count of matching structures.
The initial setup, generated by qcb init
, has already populated the top section of the config_cod_check_tutorial.yaml
file. Our task now is to customize this section with our specific command details. Let's start with the top-level parameters for our first QCrBox command (starting from below commands:
):
- Rename
name
from the placeholder to"get_number_fitting_cod_entries"
. - Modify
description
as you wish. - Add the module we will import from by replacing the value for
import_path:
with"configure_cod_check_tutorial"
, and the function we want to expose by addingcallable_name: "get_number_fitting_cod_entries"
just below this.
Note that if the Python function name we wish to call is the same as the exposed command name (for example: get_number_fitting_cod_entries
) then callable_name
can be omitted and the value of the name
entry is used.
The command will require three parameters which we specify within the parameters
section. Here's what we'll specify:
input_cif
(QCrBox.input_cif): Specifies which CIF file to pass to the COD module to check for similar structures. The type of this argument is special in that is requires the specification of cif entries. We will add a placeholder to be filled in the next section.cellpar_deviation_perc
(float): Defines the maximum allowable deviation, in percentage, for unit cell parameters between COD structures and our target structure. The default value is set at 2.0%.listed_elements_only
(boolean): When set totrue
, the search will only include entries containing the exact elements listed in theinput_cif
file. Iffalse
, the search will accept entries with additional elements beyond those listed. By default, this is set tofalse
.
Here is how you should structure the command in the YAML file:
commands:
- name: "get_number_fitting_cod_entries"
description: "Get the number of fitting COD that fit the unit cell parameters and elements."
implemented_as: "python_callable"
import_path: "configure_cod_check_tutorial"
callable_name: "qcb_get_number_fitting_cod_entries"
parameters:
- name: "input_cif"
dtype: "QCrBox.cif_data_file"
description: "Path to the input CIF file"
required_entries: [...]
- name: "cellpar_deviation_perc"
dtype: "float"
description: "The percentage deviation allowed for the unit cell parameters."
default_value: 2.0
- name: "listed_elements_only"
dtype: "bool"
description: "If True, only the elements listed in the CIF file can be present, otherwise given elements must be present but additional elements are possible."
default_value: false
Specifying Required CIF Entries For Input#
Next, we must identify which CIF entries have to be in the input cif file for our command to function. Inspect the cif_to_search_pars
function in the simple_cod_module.py
script to determine these entries. If you're adding only one command, list these required entries directly in the input_cif_path parameter definition. Ensure that required_entries:
aligns with the name:
and type:
sections of that parameter for proper structure.
required_entries: [
"_cell_length_a", "_cell_length_b", "_cell_length_c",
"_cell_angle_alpha", "_cell_angle_beta", "_cell_angle_gamma",
"_chemical_formula_sum"
]
For scenarios where certain CIF entries are beneficial but not mandatory, you could list them under optional_entries
. However, in this context, all listed entries are necessary, completing our current modifications to the YAML file. For further information about cif entry handling consult the yaml section from the CIF HowTo.
Implementing the Python Glue Code#
Important Note: Currently, some functionality that will eventually be automatically specificed, including the registration of our application and commands in Python, as well as CIF file handling and conversion—requires manual implementation. This step is temporary and is planned to be automated in future updates, following the developer alpha release. We're releasing this functionality now to provide a foundation for exploration and development.
Next, we need to implement the module and function we have referenced in the YAML file. To begin, open the configure_cod_check_tutorial.py
file. Start by importing necessary functions from the Python base libraries as well as two different modules, by adding the following to the top of the file:
import json
from pathlib import Path
from qcrboxtools.cif.cif2cif import cif_file_to_specific_by_yml
from simple_cod_module import cif_to_search_pars, get_number_fitting_cod_entries
The function cif_file_to_specific_by_yml
is designed to manage the CIF files' input and output, converting the CIF keywords used by QCrBox into those required by simple_cod_module
. Additionally, we'll utilize two specific functions from simple_cod_module
to execute our desired logic.
Let's proceed to define the necessary Python functions within configure_cod_check_tutorial.py
.
Add this below the import
s:
YAML_PATH = "./config_cod_check_tutorial.yaml"
def parse_input(input_cif, cellpar_deviation_perc, listed_elements_only):
# Convert string paths to Path objects for easier file handling
input_cif = Path(input_cif)
# Convert cellpar_deviation to the correct type and convert the given percentage to a decimal
cellpar_deviation = float(cellpar_deviation_perc) / 100.0
# Use the parent directory of the input CIF file as the working directory
work_folder = input_cif.parent
# Specify the path for the modified CIF file
work_cif_path = work_folder / "qcrbox_work.cif"
# Adjust the CIF file according to the requirements of 'simple_cod_module'
cif_file_to_specific_by_yml(
input_cif_path=input_cif,
output_cif_path=work_cif_path,
yml_path=YAML_PATH, # Referencing the edited YAML configuration
command="get_number_fitting_cod_entries", # Command name as specified in the YAML
parameter="input_cif", # Parameter name as specified in the YAML
)
return work_cif_path, cellpar_deviation, listed_elements_only
def qcb_get_number_fitting_cod_entries(input_cif, cellpar_deviation_perc, listed_elements_only):
# Transform input parameters from string to appropriate Python objects
work_cif_path, cellpar_deviation, listed_elements_only = parse_input(input_cif, cellpar_deviation_perc, listed_elements_only)
# Retrieve the number of matching entries
elements, cell_dict = cif_to_search_pars(work_cif_path)
n_entries = get_number_fitting_cod_entries(elements, cell_dict, cellpar_deviation, listed_elements_only)
# Save the output as a JSON file
with open(work_cif_path.parent / "nentries.json", "w", encoding="UTF-8") as fobj:
json.dump({"n_entries": n_entries}, fobj)
return str(work_cif_path)
The parse_input
function will eventually be phased out as QCrBox plans will take over the input parameter handling and CIF file conversion making the explicit implementation obsolete.
Registering the Python Function as a QCrBox Command#
To integrate our command with QCrBox, it's necessary to register it within the system. Update the script's concluding section as follows:
if __name__ == "__main__":
application_spec = ApplicationSpec.from_yaml_file(YAML_PATH)
client = QCrBoxClient(application_spec=application_spec)
client.run()
QCrBox recognizes the parameter names from our Python function, using them directly as command parameters within the for the commands exposed by the QCrBox container.
It's necessary to put the client launching code within a if name == "main" block at the end, since this only needs to run once when a container is started. Otherwise, each time a command is run, the Python script is imported and if the client invocation code is not guarded then the command execution will either hang or crash the application's container.
Configuring the Dockerfile#
The Dockerfile
file is preconfigured with some entries that we won't need. Here's a simplified explanation and modifications required:
-
Base Image Setup: The file begins with specifying the base image for the application. We use the
qcrbox/base-application
as our starting point, utilizing the latest version available.ARG QCRBOX_DOCKER_TAG FROM qcrbox/base-application:${QCRBOX_DOCKER_TAG}
-
Environment Setup: Specifies using
/bin/bash
for running future commands.SHELL ["/bin/bash", "-c"]
-
Inclusion of QCrBox settings files: The following lines will copy the
/configure_*.py
and the/config_*.yaml
to our container, that we have modified to integrate our program with QCrBox.COPY configure_cod_check_tutorial.py ./ COPY config_cod_check_tutorial.yaml ./
-
Module Inclusion: Ensure our module and its dependencies are included and properly set up in the container. For instance, add the Python module with:
COPY ./simple_cod_module.py ./
-
Dependency Management: Install necessary dependencies, like the
requests
module. Choose between usingmicromamba
for Conda environments orpip
for Python environments.- For
micromamba
:If you have a large number of dependencies, working with a conda .yml file is more sensible.RUN micromamba install -n qcrbox requests --yes
- For
pip
:RUN pip install requests
- For
-
Delete unnecessary lines: Any other lines in the Dockerfile can be safely deleted.
You should end up with a Dockerfile
that looks like the following (assuming the use of micromamba
and not pip
):
ARG QCRBOX_DOCKER_TAG
FROM qcrbox/base-application:${QCRBOX_DOCKER_TAG}
SHELL ["/bin/bash", "-c"]
COPY configure_cod_check_tutorial.py ./
COPY config_cod_check_tutorial.yaml ./
COPY ./simple_cod_module.py ./
RUN micromamba install -n qcrbox requests --yes
Building the container with the first command exposed#
To create a QCrBox image for our application, we'll execute a specific build command using the application slug defined earlier. Open your terminal and input the following command to start the build process:
qcb build cod_check_tutorial
Important Note: By default,
qcb build
without additional arguments performs a full rebuild of all dependencies to ensure everything is up-to-date. If you have recently completed a build and wish to save time, you can opt for the--no-build-deps
argument. This option focuses solely on building the QCrBox image without updating the dependencies.
After completing the build process, you can launch your newly created QCrBox image with the following command:
qcb up cod_check_tutorial --no-build-deps
This command starts the container without recompiling the image or its dependencies, assuming they were recently built. If you aim to update both dependencies and the image before launching, simply omit the --no-build-deps
flag. This ensures that your QCrBox image and all related components are fully up-to-date.
Build a function to load in a structure from the best matching unit cell#
Our next goal is to incorporate atomic parameters from the most compatible structure within the COD into our CIF file. This allows us to bypass the structure solution phase if matching information is readily available. To achieve this, we introduce a new command into the config_cod_check_tutorial.yaml
file. Append the following new command definition at the end of the file:
- name: "merge_closest_cod_entry"
description: "Merge the closest COD entry into the CIF file"
implemented_as: "python_callable"
import_path: "configure_cod_check_tutorial"
callable_name: "merge_closest_cod_entry"
parameters:
- name: "input_cif"
dtype: "QCrBox.cif_data_file"
description: "Path to the input CIF file."
required_entries: [
"_cell_length_a", "_cell_length_b", "_cell_length_c",
"_cell_angle_alpha", "_cell_angle_beta", "_cell_angle_gamma",
"_chemical_formula_sum"
]
- name: "output_cif_name"
dtype: "QCrBox.output_cif"
description: "Name of the the output CIF file."
required_entries: [...]
- name: "cellpar_deviation_perc"
dtype: "float"
description: "The percentage deviation allowed for the unit cell parameters."
default_value: 2.0
- name: "listed_elements_only"
dtype: "bool"
description: "If True, only the elements listed in the CIF file can be present, otherwise given elements must be present but additional elements are possible."
default_value: false
You might notice two things: Firstly, we now have an output_cif_name
parameter, which we will tackle in the next section. Secondly, our required cif entries for the input_cif
are exactly the same, as they are used by the same function within the simple_cod_module.py
file. Repeating the required cif entries might be fine in this case, as the number of entries is rather low, However, we would like to only define the set of entries once. In QCrBox we can do that using cif_entry_sets. At the end of the file we create a new entry set for our commands (replacing the existing unused example one):
Important Note: Currently, functionality that will automatically convert/merge output CIF files has not yet been fully implemented. Therefore imposing that an output CIF file has a required (or optional, etc.) set of entries needs a manual implementation within the application command. This is temporary and it is planned to be automated in future updates, following the developer alpha release. We're releasing this functionality now to provide a foundation for exploration and development.
cif_entry_sets:
- name: "cell_elements"
required: [
"_cell_length_a", "_cell_length_b", "_cell_length_c", "_cell_angle_alpha",
"_cell_angle_beta", "_cell_angle_gamma", "_chemical_formula_sum"
]
Instead of writing the entries into the individual functions we now replace the required_entries
section for input_cif_path
to have our parameter definition look like this:
...
- name: "input_cif_path"
dtype: "QCrBox.input_cif"
required_entry_sets: ["cell_elements"]
default_value: None
- name: "output_cif_name"
dtype: "QCrBox.output_cif"
required_entries: [...]
- name: "cellpar_deviation_perc"
dtype: "float"
default_value: 2.0
required: false
- name: "listed_elements_only"
dtype: "bool"
default_value: false
required: false
Try to update the input_cif_path definition of the get_number_fitting_cod_entries_tutorial
on your own, using the same format. Note that you can have multiple cif entry sets. You can also combine entry sets with individual keywords to mix and match whatever your commands need. Don't change the required_entries
part in the output_cif_name
yet!
Defining Cif Output Entries Within the YAML file#
Now, we want to define the output cif entries for our function. Again for the finer details, consult the yaml section from the CIF HowTo. The general idea is that we keep all values, which are still valid from the original cif file and only add / substitute new entries. Finally, we also delete all values from the original cif file, that have been invalidated.
The first two entries are again required_entries
and optional_entries
and their set counterparts. For cif output, required values are values that have to be in a cif file that comes from a successful calculation, whereas optional entries can be in a successful calculation.
Finally, invalidated_entries
are entries that are no longer valid with a transformation. In the given example we have changed atom positions and displacement parameters. Accordingly, anything depending on these parameters should not be kept from the original input cif file. We should delete all derived _geom
parameters and all quality indicators (as our diffraction data might be different from the one from the COD). Invalidated entries should be given to match regular expressions as used by the python re
module.
The cif output path parameter should be modified to:
- name: "output_cif_name"
type: "QCrBox.output_cif"
description: "Name of the the output CIF file."
required_entries: [
"_atom_site_label", "_atom_site_fract_x", "_atom_site_fract_y", "_atom_site_fract_z",
"_atom_site_occupancy", "_atom_site_U_iso_or_equiv", "_atom_site_type_symbol",
one_of: ["_symmetry_equiv_pos_as_xyz", "_space_group_symop_operation_xyz"]
]
optional_entries: [
"_atom_site_aniso_label", "_atom_site_aniso_U_11", "_atom_site_aniso_U_22",
"_atom_site_aniso_U_33", "_atom_site_aniso_U_12", "_atom_site_aniso_U_13",
"_atom_site_aniso_U_23", "_atom_site_adp_type", "_atom_site_site_symmetry_order",
"_atom_site_calc_flag", "_atom_site_refinement_flags_posn", "_atom_site_refinement_flags_adp"
]
invalidated_entries: [
"_atom_site.*", "_geom.*", ".*refine.*", "_iucr.*", "_shelx.*"
]
Again while the required and optional entries determine what is copied from our evaluation (here the database lookup), the invalidated entries will exclude entries from the input cif. The remaining entries from both files are then merged and output in the location of output_cif_path
.
Developing the Python Glue Code for Our Merge Command.#
We will now modify the configure_cod_check_tutorial.py
file to add the new functionality. We will use more functionality from both QCrBoxtools and our COD module. Our import section should now look like this:
import json
from pathlib import Path
from qcrboxtools.cif.cif2cif import (
cif_file_merge_to_unified_by_yml,
cif_file_to_specific_by_yml,
)
from simple_cod_module import (
cif_to_search_pars,
get_number_fitting_cod_entries,
get_fitting_cod_entries,
download_cod_cif,
)
from pyqcrbox.registry.client import QCrBoxClient
from pyqcrbox.sql_models import ApplicationSpec
The cif_file_merge_to_unified_by_yml
is the counterpart of the first function from cif2cif. We can use it to cut a non-unified cif to the entries we expect to be changed, convert it to the unified set of entries and then merge to input file we have used for our search and which should contain the X-ray data, as well as the unit cell parameters. The function get_fitting_cod_entries
returns a list of dictionaries of cod entries, sorted by the sum of squared differences in the unit cell parameters. Finally, download_cod_cif
can be used to download an entry from the cod.
We can now implement our function as follows:
def merge_closest_cod_entry(input_cif, output_cif_name, cellpar_deviation_perc, listed_elements_only):
# cast the input parameters from strings to python objects
work_cif_path, cellpar_deviation, listed_elements_only = parse_input(
input_cif, cellpar_deviation_perc, listed_elements_only
)
input_cif = Path(input_cif)
output_cif_path = input_cif.parent / output_cif_name
# get the list of fitting entries
elements, cell_dict = cif_to_search_pars(work_cif_path)
entry_lst = get_fitting_cod_entries(elements, cell_dict, cellpar_deviation, listed_elements_only)
# if no fitting entries found, raise an error
if len(entry_lst) == 0:
raise ValueError("No fitting entries found")
# download the cif file of the most fitting entry
cod_cif_path = work_cif_path.parent / "cod.cif"
download_cod_cif(entry_lst[0]["file"], cod_cif_path)
# merge the input cif file with the downloaded cif file
cif_file_merge_to_unified_by_yml(
input_cif_path=cod_cif_path,
output_cif_path=output_cif_path,
merge_cif_path=input_cif,
yml_path=YAML_PATH,
command="merge_closest_cod_entry",
parameter="output_cif_name",
)
return output_cif_path
Rebuilding and Restarting the Container#
You can now restart and rebuild the container by typing the following. Rebuilding without dependencies (using the --no-build-deps
command flag with qcb up
as stated earlier) might be faster if you have just rebuilt everything.
qcb down
qcb up cod_check_tutorial
Note that this will only bring up the cod_check_tutorial
container and any other containers that it depends on, such as the core QCrBox containers. Any other application containers won't come up.
To check if the container has started correctly we can examine its Docker logs. Use docker ps -a
to identify the container name running with the image qcrbox/cod_check_tutorial
, and then use docker logs
followed by the container name, e.g.
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2befc90b717c qcrbox/cod_check_tutorial:latest "/opt/qcrbox/entrypo…" 4 minutes ago Up 4 minutes (healthy) qcrbox-cod_check_tutorial-1
Here we can see that the container has been created and has been up for 4 minutes, and is reporting a healthy status. We can also check the status of the service directly by examining its logs:
docker logs qcrbox-cod_check_tutorial-1
...
"event": "Received response to registration request: resp={'response_to': 'qcrbox_rk_0xa167db740e034919bfa10dfc416a7694', 'status': 'success', 'msg': 'Successfully registered cod_check_tutorial 0.0.1 (qcrbox_rk_0xa167db740e034919bfa10dfc416a7694)', 'payload': None}", "extra": {}, "level": "debug", "timestamp": "2025-09-12T10:17:31.568029Z"}
You can also check if an application is registered by using the registry's API directly, by going to http://127.0.0.1:11000/api/applications. You should find the new service in the list, complete with it's YAML interface definition.
Here we can see that the container's request to register with the QCrBox Registry service has been successful, so the container is ready to accept commands from a user workflow.
You may see errors from Pydantic, which ensures the YAML is correctly formatted. If this is the case, check and correct any errors in the config_cod_check_tutorial.yaml
file.
Validating our New Container using the QCrBox Frontend#
You can use the QCrBox web management front-end to check your new application container is working, once you have it installed and running.
First, let's check we have our server configured to run a test, and then upload a test CIF file to feed into our new container:
- Log in to the QCrBox front-end.
- Create a new user group containing your QCrBox account if you haven't already:
- Select
Groups
from the top navigation bar and+ Create New Group
. - Select
Users
from the navigation bar, andEdit
on the line with your user account, ensuring that the new group is selected in theGroups
field. - Select
Save
. - Upload a CIF file to the front-end we'll use to test our new container:
- Download the following CIF file to your local machine.
- Select
Home
from the navigation bar, then selectBrowse...
and the downloaded CIF file, and selectUpload
. - Select the uploaded CIF file below
Load Existing File:
, and selectLoad
.
Now we can run our new service against our uploaded test CIF file:
- Select the
COD Check Tutorial: Merge Closest Cod Entry
command from the dropdown list on the right (note: do not select the one prefixed withCOD Check
, since that's an existing service), and selectSelect Application
. - In the parameters choices that appear, you should see the default
Cellpar deviation perc
that we specified as2.0
, and an uncheckedListed elements only
option, so let's keep those values. - Select
Launch Application
.
You should now see that the command is being executed.
Hopefully, once complete, you should see an output qcrbox_work_merge_closest_cod_entry.cif
file available in the workflow view on the left.
Select the download icon to download the resultant CIF file to your local machine.
If you examine the contents of this output file, you should see three loop
entries added to the file,
that correspond to what we expect as we specified in our config_cod_check_tutorial.yaml
file.
Conclusion and final remarks#
We have now exposed two commands in QCrbox from a Python module. One that only analyses a cif file to produce some output, and another one that works from an input cif file to an output cif.
For more examples you might consider looking into the already implemented programs in services/applications
. If this tutorial is unclear at any point please raise an issue on Github with the specific problem that you ran into.