diff options
| author | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-24 08:40:49 +0000 |
|---|---|---|
| committer | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-24 08:40:49 +0000 |
| commit | 8f63cf9f41bbdb8d55cd4679872d2b4ae2129324 (patch) | |
| tree | ab5c95888849e854f2346db856c7edece7c8b8a7 /readme.md | |
EC-SBM community detection analysis: full pipeline and writeup
Implement community detection on 3 EC-SBM networks (polblogs, topology,
internet_as) using 5 methods (Leiden-Mod, Leiden-CPM at 0.1 and 0.01,
Infomap, graph-tool SBM). Compute AMI/ARI/NMI accuracy, cluster statistics,
and generate figures and LaTeX report.
Diffstat (limited to 'readme.md')
| -rw-r--r-- | readme.md | 24 |
1 files changed, 24 insertions, 0 deletions
diff --git a/readme.md b/readme.md new file mode 100644 index 0000000..3df5047 --- /dev/null +++ b/readme.md @@ -0,0 +1,24 @@ +The data is structured as follows. There will be directories in the format of +``` +networks/<input-clustering>/<network-id>/<run-id>/ +``` +where: +- `<input-clustering>`: The input clustering of the input data + - `leiden-cpm-0.1`: Leiden clustering optimizing the CPM with resolution 0.1, i.e., Leiden-CPM(0.1) + - `leiden-cpm-0.01`: Leiden clustering optimizing the CPM with resolution 0.01, i.e., Leiden-CPM(0.01) + - `leiden-cpm-0.001`: Leiden clustering optimizing the CPM with resolution 0.001, i.e., Leiden-CPM(0.001) + - `leiden-mod`: Leiden clustering optimizing modularity, i.e., Leiden-Mod + - `sbm+cc`: flat SBM computed using graph-tool with the lowest description length, followed by CC, i.e., SBM+CC + - `leiden-cpm-0.1+cm`: Leiden clustering optimizing the CPM with resolution 0.1, followed by CM, i.e., Leiden-CPM(0.1)+CM + - `leiden-cpm-0.01+cm`: Leiden clustering optimizing the CPM with resolution 0.01, followed by CM, i.e., Leiden-CPM(0.01)+CM + - `leiden-cpm-0.001+cm`: Leiden clustering optimizing the CPM with resolution 0.001, followed by CM, i.e., Leiden-CPM(0.001)+CM + - `leiden-mod+cm`: Leiden clustering optimizing modularity, followed by CM, i.e., Leiden-Mod+CM + - `sbm+wcc`: flat SBM computed using graph-tool with the lowest description length, followed by WCC, i.e., SBM+WCC +- `<network-id>`: The identifier of the network + - e.g., `dnc`, `academia_edu`, `hyves`, etc. +- `<run-id>`: The identifier of the run (only 1 per clustered network in this dataset) + - `0`: the only run in this dataset + +Each directory contains the following files: +- `edge.tsv`: The edge list of the network with two tab-separated values (`node1`, `node2`) indicating the two nodes connected by the edge. The network is undirected, so each edge is expected to appear only once. +- `com.tsv`: The community assignment of the nodes in the network, with two tab-separated values (`node`, `community`) indicating the community to which each node belongs. |
