Skip to content

Wrapping a command line executable to run within QCrBox#

This guide walks you through the process of encapsulating an external command within a QCrBox container. Specifically, we'll create a bash script that will allow as to run CellCheckCSD to produce an xml file with similar cells as output.

Prerequisites#

Before starting, ensure your development environment is set up following the guide located here. During this tutorial you will work with Docker, Bash, and an understanding of YAML configurations. If you're new to these concepts you can just type in the commands as listed in the tutorial. Alternatively, you can consult additional resources on Docker, Bash, and YAML for foundational knowledge.

We will also need the linux version of CellCheckCSD from the CSD Website. Download it and check the filename / version number. If it has changed you will have to adapt some steps accordingly.

Initial Setup#

To begin, initialize a new QCrBox container for our module:

  1. Open your terminal.
  2. Type qcb init cellcheckcsd and press Enter.
  3. You will be prompted to provide basic information about your application through a guided dialogue. Follow the prompts to complete the setup. It does not change any functionality, but try to enter the correct version number of CellCheckCSD.
Please provide some basic information about your application.
The following dialog will guide you through the relevant settings.

  [1/7] Select application_type
    1 - CLI
    Choose from [1] (1):
  [2/7] application_slug (cellcheckcsd):
  [3/7] application_name (cellcheckcsd): CellCheckCSD
  [4/7] application_version (x.y.z): 1.2.14
  [5/7] description (Brief description of the application.): Create an xml file containing the matches within the CSD for a given structure.
  [6/7] url (): https://www.ccdc.cam.ac.uk/solutions/software/cellcheckcsd/
  [7/7] email ():

Created scaffolding for new application in 'T:\QCrBox_location\services\applications\cellcheckcsd'.

Understanding the Generated Scaffolding#

Navigate to the application's folder to see the files generated by the boilerplace CLI. You'll find:

  • docker-compose.cellcheckcsd.*.yml: Docker Compose files, typically unchanged for non-GUI applications.
  • sample_cmd.sh: An example bash file for CLI applications. This can be deleted.
  • Dockerfile: Contains instructions for the docker executable to build the container.
  • config_cellcheckcsd.yaml: Future versions will use this to define exposed functions. Currently, it requests CIF keywords.
  • configure_cellcheckcsd.py: Here, we'll register our function. In future this will no longer be necessary.

Next, download the bash script we will use as an executable here and copy cif2cellcheckcsd.sh into the cellcheckcsd folder. Copy the downloaded version of CellCheckCSD into the folder, as well. If you are planning to develop and contribute to the QCrBox repository it is also a good idea to add the CellCheckCSD file to .gitignore to prevent the accidential commit of copyrighted material.

Adding the Command to config_cellcheckcsd.yaml#

We want a command that produces the xml output of CellCheckCSD for a given cif file. In a first step we will now adapt the configuration in config_cellcheckcsd.yaml to allow this functionality. First change the name of the first command to cell_check_csd and delete the second entry. You also should remove the example boilerplate for cif entries and cif_entry_sets at the end of the file. We can take a look into the bash file to see which command line arguments are needed:

  1. input_cif_path (QCrBox.input_cif): Identifies the CIF file employed for locating similar structures. The type of this argument is special in that is requires the specification of cif entries. We will add a placeholder to be filled in the next section.
  2. dimension_tolerance (float): Specifies the maximum permissible deviation for unit cell length dimensions between the CSD structures and our target structure.
  3. angle_tolerance (float): Determines the maximum allowable deviation for angles.
  4. maximum_hits (int): Caps the number of search results.

Here is how you should structure the command in the YAML file:

commands:
  - name: "cell_check_csd"
    implemented_as: "cli_command"
    call_pattern: cif2cellcheckcsd.sh {input_cif_path} {dimension_tolerace} {angle_tolerance} { maximum_hits}
    description: "Look up the number of entries with a similar cell within the CSD"
    parameters:
      - name: "input_cif_path"
        dtype: "QCrBox.input_cif"
        description: "Path to CIF file to refine"
        required_entries: [...]
        required: True
      - name: "dimension_tolerace"
        dtype: "float"
        description: "Maximum deviation in cell length parameters (a, b, c) in %"
        default_value: 1.5
        required: true
      - name: "angle_tolerance"
        dtype: "float"
        description: "Maximum deviation in cell angles in %"
        default_value: 1.5
        required: true
      - name: "maximum_hits"
        dtype: "int"
        description: "Maximum number of saved hits"
        default_value: 200
        required: true

Specifying CIF Entries#

Next, we must identify which CIF entries are essential for our command to function and therefore have to be contained in the provided input_cif. You can take another look into the .sh file. As you can see, we need the CIF entries for the cell parameters.

Therefore, we add the cell parameters to the required entries. The first parameter entry for our cell_check_csd command should look as follows:

      - name: "input_cif_path"
        dtype: "QCrBox.input_cif"
        description: "Path to CIF file to refine"
        required_entries: [
          "_cell.length_a", "_cell.length_b", "_cell.length_c", "_cell.angle_alpha",
          "_cell.angle_beta", "_cell.angle_gamma"
        ]
        required: True

We can also use two possibilities to derive the lattice centring. If we have provided neither of them, we will default to "P". We can use the one_of keyword to indicate that we need either of the two entries and add it to the optional_entries to indicate that it will also work without the entry. Our parameter definition should now look as follows:

      - name: "input_cif_path"
        dtype: "QCrBox.input_cif"
        description: "Path to CIF file to refine"
        required_entries: [
          "_cell.length_a", "_cell.length_b", "_cell.length_c", "_cell.angle_alpha",
          "_cell.angle_beta", "_cell.angle_gamma"
        ]
        optional_entries: [one_of: ["_space_group.centring_type", "_space_group.name_h-m_alt"]]
        required: True

For further information about cif entry handling consult the yaml section from the CIF HowTo. We add the required unit cell entries and the optional lattice centring entries to our yaml file. Take care that required_entries and optional_entries are aligned with the parameters name/type:

Adapting the Dockerfile for CellCheckCSD Integration#

We will now define the container's environment and ensuring all necessary components are included for our application by editing the Dockerfile. Let us go through the file line by line. You need to add missing lines.

  1. Establishing the Base Image: The Dockerfile begins by specifying the base image. For our purposes, we use qcrbox/base-application as the foundation, selecting the latest version available for consistency and access to the most recent features.

    ARG QCRBOX_DOCKER_TAG
    FROM qcrbox/base-application:${QCRBOX_DOCKER_TAG}
    

  2. Changing the user to root: Some of the following installation steps needs administrator priviledges. Therefore we execute them using the root user.

    USER root
    

  3. Configuring the Shell Environment: It's essential to define the shell environment for executing future commands. Here, we specify using /bin/bash.

    SHELL ["/bin/bash", "-c"]
    

  4. Copying Configuration Files: We copy both the Python configuration script and the YAML settings file into the container. These are the two files we have edited previously.

    COPY configure_cellcheckcsd.py ./
    COPY config_cellcheckcsd.yaml ./
    

  5. Dependency Installation: The library libglib2.0 is necessary for the proper functioning of CellCheckCSD. We can install the library using the package manager apt-get.

    RUN apt-get update -y && \
        apt-get install -y --fix-missing --no-install-recommends libglib2.0-0
    

    Note: For reference, the QCrBox base image is build from the debian bookworm image.

  6. Script and Module Integration: We now copy the custom bash script to run CellCheckCSD on CIF files. Additionally, we run the CellCheckCSD installer.

  7. First, copy the bash script responsible for interfacing with CellCheckCSD:
    COPY cif2cellcheckcsd.sh /opt/cellcheckcsd/bin/
    
  8. Then, execute the CellCheckCSD installer, followed by clean-up procedures to maintain a lean container. Make sure that the filename of the CellCheckCSD installer you download matches the filename here and adapt it if it does not match:

    COPY CellCheckCSD-1.2.14-linux-x64-installer.run ./
    RUN chmod +x ./CellCheckCSD-1.2.14-linux-x64-installer.run && \
        ./CellCheckCSD-1.2.14-linux-x64-installer.run --mode unattended --prefix /opt/CCDC/CellCheckCSD && \
        rm CellCheckCSD-1.2.14-linux-x64-installer.run && \
        chmod +x /opt/cellcheckcsd/bin/cif2cellcheckcsd.sh
    
    Ideally execution and clean-up should be executed within the same RUN command to limit the size of the resulting container. Be especially wary of chmod commands on (filled) directories.

    The last command makes our script an executable.

  9. Changing the user to the QCrBox user: Runtime should not be executed in root. Not only are the elevated priviledges not necessary, some programs will get problems if we run them as the root user. So we change the user at the end to the QCrBox base user defined in our environment variables.

    USER ${QCRBOX_USER}
    

  10. Environment Path Adjustment: Finally, we adjust the PATH environment variable to include the directories containing the CellCheckCSD executable and our custom script. This modification ensures that these components are readily accessible for execution within the container.

    USER ${QCRBOX_USER}
    ENV PATH="$PATH:/opt/cellcheckcsd/bin/:/opt/CCDC/CellCheckCSD/bin"
    

Note: If your application is compiled within the container you can reduce the size of the resulting containers by using multi-stage builds.

Building the container with the first command exposed#

To create a QCrBox image for our application, we'll execute a specific build command using the application slug defined earlier. Open your terminal and input the following command to start the build process:

qcb build cellcheckcsd

Important Note: By default, qcb build without additional arguments performs a full rebuild of all dependencies to ensure everything is up-to-date. If you have recently completed a build and wish to save time, you can opt for the --no-deps argument. This option focuses solely on building the QCrBox image without updating the dependencies.

After completing the build process, you can launch your newly created QCrBox image with the following command:

qcb up cellcheckcsd --no-rebuild-deps

This command starts the container without recompiling the image or its dependencies, assuming they were recently built. If you aim to update both dependencies and the image before launching, simply omit the --no-rebuild-deps flag. This ensures that your QCrBox image and all related components are fully up-to-date.