HiCognition provides a number of concrete preprocessing tasks that are related to the abstract tasks defined in the concepts section. These can be structured as tasks that aggregate genomic features for a given region set, and as tasks that aggregate collections of features. The preprocessing tasks are associated with specific visual exploration widgets, although some tasks produce data for multiple widgets. Detailed explanations on algoritms can be found in the widgets section.
These tasks aggregate a single genomic feature on a genomic region set and thus make this feature available for exploration.
is aggregated feature -------------------> region-set
These tasks aggregate a collection of genomic features on a genomic region set and make this collection available for exploration.
feature1 --- | feature2 ---| is aggregated |------> Collection -------------------> region-set . | | featureN ---
This task produces data for the 1D-feature embedding widget. It amounts to embedding the genomic region set using the 1D-features into a 2D-space. <!– ulrich: “embedding” understandable for biologists, not so for informaticians >
A visual exploration tool is only useful if the tasks it fulfills can be completed in a reasonable time. Therefore, we have worked hard on minimizing the required time for each preprocessing step. If you use the default configurations and a machine that complies with our hardware requirements, none of the jobs should take longer than ~ 3 minutes. The following table gives a rough estimate of how long different preprocessing steps are expected to run for common input sizes.
|Region size||Preprocessing task||Duration [min]|
|1000||Aggregate a 1D-feature at a genomic region set||0.1|
|1000||Aggregate a 2D-feature at a genomic region set||0.6 *|
|1000||Aggregate a 1D-feature collection at a genomic region set||0.5|
|1000||Aggregate a Region collection at a genomic region set||0.5|
|50000||Aggregate a 1D-feature at a genomic region se||0.5|
|50000||Aggregate a 2D-feature at a genomic region set||3 *|
|50000||Aggregate a 1D-feature collection at a genomic region set||2.5|
|50000||Aggregate a Region collection at a genomic region set||2.5|
* For the “Aggregate a 2D-feature at a genomic region set” task, the first time you preprocess that particular region, you should expect that it runs ~2x as long as we calculate the Observed/expected values the first time and then cache them for future runs.
If you change the configuration for windowsizes and binsizes, the jobs may take much longer and require more memory.
That being said, on-demand preprocessing is just one of the potential user flows. We think many preprocessing steps can be submitted in bulk as a large part of exploration tasks involve common “dataset ingredients”.