UMAP representation and Clustering

UMAP representation

To generate a two dimensional representation of the interaction pair attributes, Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) was performed. Specifically, centred and scaled linear quantitations of the different sequencing data modalities from the EpiLC TET WT condition were subjected to UMAP using the uwot package with spectral initialisation (Settings: min_dist = 0.01, n_neibors = 2, scale = TRUE).

EM Clustering

To produce input for clustering, dimensionality reduction to 15 dimensions was applied to the same quantitations of interaction pair attributes (uwot, min_dist = 0.01, n_neibors = 25, scale = TRUE). Clustering was then performed by Expectation Maximisation for 20 clusters using WEKA (Scheme: weka.clusterers.EM -I 100 -N 20 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100 ).

Louvain clustering

As an alternative to EM clustering and to test the robustness of the clustering approach, Louvain clustering after PCA was performed. All values of interaction pair attributes were log normalised and centred by subtracting the average. Levels were also scaled by dividing the centered attribute values by their standard deviation. Then, linear dimensionality reduction was performed by principal component analysis (PCA). The number of dimensions to use for clustering was set to 12 as gauged from the elbow plot (PC vs standard deviation). A KNN graph was constructed based on the euclidean distance in PCA space with edge weights between any two interaction pairs refined based on their shared overlap in their local neighbourhoods (Jaccard similarity) using Seurat functions (FindNeighbors(dims = 1:12), FindClusters(resolution = 1)).

Marks for individual clusters

P marks (EM clusters)

P marks EM clusters

PIR marks (EM clusters)

PIR marks EM clusters

P marks (Louvain clusters)

P marks Louvain clusters

PIR marks (Louvain clusters)

PIR marks Louvain clusters

Grouping clusters

While the number of clusters for Louvain clustering was chosen empirically and set to 20, Louvain clustering with the given settings produced 30 clusters. However, 9 clusters had fewer than 600 interaction pairs (the larger clusters having between 14,000 and 21,000 interaction pairs) and were therefore excluded from further analysis.

We then evaluated chromatin/DNA marks per cluster and grouped them according to presumed biological function.

EM cluster grouping

cluster group	individual clusters
EM activeP - lowmarkedPIR	4, 7, 10, 11, 12, 13, 18, 16
EM pcP - unmarkedPIR	0, 2, 9
EM inactiveP - unmarkedPIR	1, 5, 6, 15
EM inactiveP - CTCFprimedE	19
EM inactiveP - primedE	14
EM bivalentP - poisedE	3
EM activeP - activeP	17
EM activeP - activeE	8

Louvain cluster grouping

cluster group	individual clusters
Louvain activeP - lowmarkedPIR	2, 5, 9, 13, 15, 16
Louvain pcP - unmarkedPIR	0, 12, 17, 30
Louvain inactiveP - unmarkedPIR	3, 4, 7, 9, 10, 14, 21, 22, 23, 26, 27, 28, 29
Louvain inactiveP - CTCFprimedE	1, 19
Louvain inactiveP - primedE	20, 6
Louvain bivalentP - poisedE	18, 25
Louvain activeP - activeP	11, 24
Louvain activeP - activeE	8

EM and Louvain clustering comparison