summaryrefslogtreecommitdiff
path: root/readme.md
diff options
context:
space:
mode:
authorYurenHao0426 <blackhao0426@gmail.com>2026-02-24 08:40:49 +0000
committerYurenHao0426 <blackhao0426@gmail.com>2026-02-24 08:40:49 +0000
commit8f63cf9f41bbdb8d55cd4679872d2b4ae2129324 (patch)
treeab5c95888849e854f2346db856c7edece7c8b8a7 /readme.md
EC-SBM community detection analysis: full pipeline and writeup
Implement community detection on 3 EC-SBM networks (polblogs, topology, internet_as) using 5 methods (Leiden-Mod, Leiden-CPM at 0.1 and 0.01, Infomap, graph-tool SBM). Compute AMI/ARI/NMI accuracy, cluster statistics, and generate figures and LaTeX report.
Diffstat (limited to 'readme.md')
-rw-r--r--readme.md24
1 files changed, 24 insertions, 0 deletions
diff --git a/readme.md b/readme.md
new file mode 100644
index 0000000..3df5047
--- /dev/null
+++ b/readme.md
@@ -0,0 +1,24 @@
+The data is structured as follows. There will be directories in the format of
+```
+networks/<input-clustering>/<network-id>/<run-id>/
+```
+where:
+- `<input-clustering>`: The input clustering of the input data
+ - `leiden-cpm-0.1`: Leiden clustering optimizing the CPM with resolution 0.1, i.e., Leiden-CPM(0.1)
+ - `leiden-cpm-0.01`: Leiden clustering optimizing the CPM with resolution 0.01, i.e., Leiden-CPM(0.01)
+ - `leiden-cpm-0.001`: Leiden clustering optimizing the CPM with resolution 0.001, i.e., Leiden-CPM(0.001)
+ - `leiden-mod`: Leiden clustering optimizing modularity, i.e., Leiden-Mod
+ - `sbm+cc`: flat SBM computed using graph-tool with the lowest description length, followed by CC, i.e., SBM+CC
+ - `leiden-cpm-0.1+cm`: Leiden clustering optimizing the CPM with resolution 0.1, followed by CM, i.e., Leiden-CPM(0.1)+CM
+ - `leiden-cpm-0.01+cm`: Leiden clustering optimizing the CPM with resolution 0.01, followed by CM, i.e., Leiden-CPM(0.01)+CM
+ - `leiden-cpm-0.001+cm`: Leiden clustering optimizing the CPM with resolution 0.001, followed by CM, i.e., Leiden-CPM(0.001)+CM
+ - `leiden-mod+cm`: Leiden clustering optimizing modularity, followed by CM, i.e., Leiden-Mod+CM
+ - `sbm+wcc`: flat SBM computed using graph-tool with the lowest description length, followed by WCC, i.e., SBM+WCC
+- `<network-id>`: The identifier of the network
+ - e.g., `dnc`, `academia_edu`, `hyves`, etc.
+- `<run-id>`: The identifier of the run (only 1 per clustered network in this dataset)
+ - `0`: the only run in this dataset
+
+Each directory contains the following files:
+- `edge.tsv`: The edge list of the network with two tab-separated values (`node1`, `node2`) indicating the two nodes connected by the edge. The network is undirected, so each edge is expected to appear only once.
+- `com.tsv`: The community assignment of the nodes in the network, with two tab-separated values (`node`, `community`) indicating the community to which each node belongs.