Configuring your HiCognition instance can be done using two approaches:
The most convenient way of setting your environment variables is using a .env
file that specifies them as key-value pairs. docker-compose
will use this file - if it is in the same directory - and inject the variables during start-up.
Never commit a .env
file for your production server to version control!
Alternatively, you can inject the environment variables in your deploy pipeline. This is probably the cleanest approach and can be easily done with, for example, github actions.
We expose a wide array of environment variables, some of which you don’t need to change, whereas others we highly recommend changing.
These environment variables should be redefined if you deploy a new HiCognition instance.
SECRET_KEY
| Secret key of flask app that is used to sign the generated tokenDATABASE_URI
| Connection string to the HiCogntion MySQL database (including username and password)MYSQL_PASSWORD
| Password for MySQL database.env
fileThese environment variables can likely be taken from our example .env
file and probably only need changing if you are deploying a custom setup.
UPLOAD_DIR
| Directory that is used to store uploaded datasets. This filepath is used inside the Docker container and should therefore be in relation to the mounted folder with code and data.CHROM_SIZES
| Path to the chromosome sizes file on the server filesystem that is needed for the hicognition
module. This filepath is used inside the Docker container and should therefore be in relation to the mounted folder with code and data.CHROM_ARMS
| Path to the file harboring genomic locations of chromosomal arms on the server filesystem that is needed for pileups in the tasks.py
file containing different background tasks. This filepath is used inside the Docker container and should therefore be in relation to the mounted folder with code and data.REDIS_URL
| URL of Redis serverDIR_FRONT_END
| Relative path to the front_end files.DIR_STATIC
| Static directory for the Nginx serverMYSQL_DATA_DIR
| Relative directory to persist MySQL database toINTEGRATION_TESTS
| Relative path to the integration test directoryDOC_PATH
| Relative path to documentation filesREPOSITORIES
| Defines a list of ENCODE repositories to use. Default: 4D NucleomeIf you are thinking about setting up a development instance of HiCognition, be sure to check out the development section.
In the development instance, the entire HiCognition backend repository is mounted through into the flask server to facilitate hot-reload. You can specify the location of the repository using the following environment variable:
HICOGNITION_DIR
| Location of the backend part of the HiCognition repository.This variable can be taken from our example .env
file.
If you need to dive deeper into the configuration of HiCognition, you can edit the config file of the flask app (located at back_end/app/config.py
relative to the HiCognition repository). This config file defines data classes for different configuration states of HiCognition:
Config
| Main config class that defines common configuration settings between all config classesDevelopmentConfig
| Development config file that defines settings for flask in development modeTestingConfig
| Config file for flask unittestsEnd2EndConfig
| Config file for integration testsBe sure to rebuild the HiCognition container after you change this config file!
Note that some of the configuration variables are initialized from environment variables. So if you change them, be sure that environment variables do not override these changes!
The following configuration variables are related to how the flask server is set up and are most likely not candidates for tuning since they refer to simple paths, flags are secrets:
SECTRET_KEY
| Secret key of flask app that is used to sign the generated tokenSQLALCHEMY_TRACK_MODIFICATIONS
| Flag that specifies whether database modifications should be trackedUPLOAD_DIR
| Directory that is used to store uploaded datasets. This filepath is used inside the Docker container and should therefore be in relation to the mounted folder with code and data.CHROM_SIZES
| Path to the chromosome sizes file on the server filesystem that is needed for the hicognition
module. This filepath is used inside the Docker container and should therefore be in relation to the mounted folder with code and data.CHROM_ARMS
| Path to the file harboring genomic locations of chromosomal arms on the server filesystem that is needed for pileups in the tasks.py
file containing different background tasks. This filepath is used inside the Docker container and should therefore be in relation to the mounted folder with code and data.REDIS_URL
| URL of Redis serverTESTING
| Whether the server is in unittesting modeEND2END
| Whether the server is in integration testing mode (this is needed for setup of test users in the database)These configuration variables determine how preprocessing is performed and, depending on the types of tasks at hand, might be in need of adjustment.
PREPROCESSING_MAP
The PREPROCESSING_MAP
configuration variable defines which types of datasets should be processed with what binsize for a given windowsize. The general structure is the following:
{
windowsize: {
datatype: [
binsize,
.
.
.
],
.
.
.
},
.
.
.
}
The default PREPROCESSING_MAP
has been chosen to provide a balance between resolution and preprocessing time. If your application requires finer resolution or different binsizes, keep in mind that small binsizes at large windowsizes can cause large loads on your server and adjustments should be made with caution.
VARIABLE_SIZE_EXPANSION_FACTOR
This floating-point number is used when preprocessing regions that have been added as interval features. It specifies how much the snippets should expand to the left and right as a fraction of the specific region.
Left Expansion Region Right Expansion
---------------|---------------------------------|---------------
PIPELINE_NAMES
This configuration variable defines the preprocessing pipelines to be used for a given datatype. The structure of these values is:
{
datatype: (pipeline_function, pipeline_description),
.
.
.
}
The pipeline_function
refers to a function in the file back_end/app/tasks.py
.
PIPELINE_QUEUES
This configuration variable holds a mapping between data types and the queue it should be processed on. Currently, we define three different queues:
short
medium
long
The short
queues is reserved for small tasks that are not related to preprocessing. The medium
queue refers to tasks that can be typically completed in less than 30s, and the long
queue refers to tasks that run longer than 30s. The structure of the mapping is:
{
datatype: queue,
.
.
.
}
CLUSTER_NUMBER_LARGE
and CLUSTER_NUMBER_SMALL
These variables refer to the number of clusters to use when grouping regions in the embedding widgets (the 1d-feature embedding widget and 2d-feature embedding widget). The variable CLUSTER_NUMBER_LARGE
is used to define small neighborhoods, and the variable CLUSTER_NUMBER_SMALL
is used to define large neighborhoods (a large number of clusters means that the neighborhood they represent is smaller).
DATASET_OPTION_MAPPING
This configuration variable defines metadata options for different data types. The general structure is:
{
"DatasetType": {
"bedfile": {
"ValueType": {
"ValueType1": {
"Option1": ["Possible", "Values"],
.
.
.
},
.
.
.
}
},
"bigwig": {
"ValueType": {
"ValueType1": {
"Option1": ["Possible", "Values"],
.
.
.
},
.
.
.
}
},
"cooler": {
"ValueType": {
"ValueType1": {
"Option1": ["Possible", "Values"],
.
.
.
},
.
.
.
}
},
}
}
Here, each dataset type can define multiple value types that can have multiple custom options. Note that changing option values only requires changing this configuration option, whereas adding value types and types of options requires changing the Dataset
database model.
Temporarily FILETYPES
has been defined that allows for definition of optional metadata fields, that will be used for dynamically creating the file upload forms for feature and region sets.
STACKUP_THRESHOLD
This configuration variable defines the number of rows above which intervals are downsampled for displaying in the stacked lineprofile widget.
OBS_EXP_PROCESSES
and PILEUP_PROCESSES
Defines the number of processes to use per worker to calculate observed/expected matrices and pileups, respectively.
There are four different Docker compose files that allow starting HiCognition in different modes:
docker-compose.yml
| This file is used to start HiCognition in “production mode”docker_dev.yml
| This file is used to start HiCognition in “development mode”docker_integration_tests.yml
| This file is used to run the integration tests (see the testing section for more details)docker_dev_hicog_lib.yml
| This file is used to start HiCognition in “development mode”. It allows placement of hicognition_lib and ngs_lib in the root of the directory.