summaryrefslogtreecommitdiff
path: root/readme.md
blob: 3df50477dbbfb3c57a11d97f7c19b7dd0bb9abf4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
The data is structured as follows. There will be directories in the format of
```
networks/<input-clustering>/<network-id>/<run-id>/
```
where:
- `<input-clustering>`: The input clustering of the input data
    - `leiden-cpm-0.1`: Leiden clustering optimizing the CPM with resolution 0.1, i.e., Leiden-CPM(0.1)
    - `leiden-cpm-0.01`: Leiden clustering optimizing the CPM with resolution 0.01, i.e., Leiden-CPM(0.01)
    - `leiden-cpm-0.001`: Leiden clustering optimizing the CPM with resolution 0.001, i.e., Leiden-CPM(0.001)
    - `leiden-mod`: Leiden clustering optimizing modularity, i.e., Leiden-Mod
    - `sbm+cc`: flat SBM computed using graph-tool with the lowest description length, followed by CC, i.e., SBM+CC
    - `leiden-cpm-0.1+cm`: Leiden clustering optimizing the CPM with resolution 0.1, followed by CM, i.e., Leiden-CPM(0.1)+CM
    - `leiden-cpm-0.01+cm`: Leiden clustering optimizing the CPM with resolution 0.01, followed by CM, i.e., Leiden-CPM(0.01)+CM
    - `leiden-cpm-0.001+cm`: Leiden clustering optimizing the CPM with resolution 0.001, followed by CM, i.e., Leiden-CPM(0.001)+CM
    - `leiden-mod+cm`: Leiden clustering optimizing modularity, followed by CM, i.e., Leiden-Mod+CM
    - `sbm+wcc`: flat SBM computed using graph-tool with the lowest description length, followed by WCC, i.e., SBM+WCC
- `<network-id>`: The identifier of the network
    - e.g., `dnc`, `academia_edu`, `hyves`, etc.
- `<run-id>`: The identifier of the run (only 1 per clustered network in this dataset)
    - `0`: the only run in this dataset

Each directory contains the following files:
- `edge.tsv`: The edge list of the network with two tab-separated values (`node1`, `node2`) indicating the two nodes connected by the edge. The network is undirected, so each edge is expected to appear only once.
- `com.tsv`: The community assignment of the nodes in the network, with two tab-separated values (`node`, `community`) indicating the community to which each node belongs.