Genomic features are an essential part of the region-set focus approach and are defined as 1D- or 2D-continuous data that is mapped to a genomic coordinate system. Examples are ChiP-seq read density, Hi-C interaction values, and GRO-seq tracks. Genomic features are mainly used to understand genomic regions in the context of the region-set focus approach. They can thus be thought of as “independent variables” that are used to explain the “dependent” genomic regions.
Bigwig files should conform to the Bigwig-standard, which translates to being readable by bigwig interface libraries (in our case, this is pybbi). In addition, your bigwig file should contain data that were aligned against the selected genome assembly. But as for BED-files, this is a requirement we can only check to a limited degree.
Always check that your bigwig file was derived from data that were aligned against the selected genome assembly!
HiCognition can load 2D-genomic features from multiresolution cooler files. Here, they same cautionary notes with respect to genome assemblies apply. In addition, mcool files need to contain the resolutions defined in the flask configuration file. For the default setting these are:
Note that we consciously don’t try to search for the nearest fitting resolution as this would compromise comparability between genomic features. If you need a primer on how to create mcool files and what strategies for data normalization are available, see the cooler documentation.
Always check that your cooler file is compatible with the defined resolution restrictions!
To allow efficient filtering and searching as well as an easier display of genomic features, we defined a formal description that is exposed to the user when adding and viewing genomic features.
The structure of the description fields for 1D features can be seen in the following graph:
In this graph, all nodes representing a value that the user can select are red. If the user’s selection can lead to different selections lower in the graph, these options are displayed as black nodes. The first children of Feature1D
are fields that need to be defined for all region sets and have no influence on the later selection hierarchy:
Then, the ValueType
needs to be selected. This describes what kind of genomic feature this dataset are. This can be one of the following:
Selection of a specific ValueType
leads to different required metadata downstream. E.g., if the ValueType
is ProteinBinding
, then the Protein
field needs to be filled out.
The structure of the description fields for 2D features can be seen in the following graph:
In this graph, all nodes representing a value that the user can select are red. If the user’s selection can lead to different selections lower in the graph, these options are displayed as black nodes. The first children of Featur2D
are fields that need to be defined for all region sets and have no influence on the later selection hierarchy:
2D features currently only define a single ValueType
:
Thus, the other fields are determined when uploading a 2D-feature dataset.
The addition of genomic features can be started by clicking on the Add genomic features
button in the data management drawer:
This will open a dialogue that follows the same principles as the dialogue described for addition of genomic regions.
Genomic features are typically much larger in size than genomic regions. This will cause the upload to take longer!
Managing genomic features is equivalent to managing genomic regions, except that the table component is launched from a different menu button: