diff options
| author | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-24 08:40:49 +0000 |
|---|---|---|
| committer | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-24 08:40:49 +0000 |
| commit | 8f63cf9f41bbdb8d55cd4679872d2b4ae2129324 (patch) | |
| tree | ab5c95888849e854f2346db856c7edece7c8b8a7 /assignment.html | |
EC-SBM community detection analysis: full pipeline and writeup
Implement community detection on 3 EC-SBM networks (polblogs, topology,
internet_as) using 5 methods (Leiden-Mod, Leiden-CPM at 0.1 and 0.01,
Infomap, graph-tool SBM). Compute AMI/ARI/NMI accuracy, cluster statistics,
and generate figures and LaTeX report.
Diffstat (limited to 'assignment.html')
| -rw-r--r-- | assignment.html | 112 |
1 files changed, 112 insertions, 0 deletions
diff --git a/assignment.html b/assignment.html new file mode 100644 index 0000000..88f2c2c --- /dev/null +++ b/assignment.html @@ -0,0 +1,112 @@ +This homework involves data analysis (running community detection +methods +on synthetic networks) and writing a report about what +you did and what you observed. +You should also relate the results you find to other papers +you have read or heard about (e.g., in class lectures): are they +similarto what you saw or are they different? +What did you learn? +What are you still trying to figure out? + +Here are some basic instructions. +Perform community detection using at least two methods +on at least two EC-SBM networks based on SBM+WCC input parameters. +One of the networks should be relatively small (under 10,000 nodes) +and the others should be at least 30,000 nodes. +The-Anh suggests these: +<ul> +<li> +topology (35K nodes, 171K edges) +<li> +internet_as (23K nodes, 48K edges) +<li> +marker_cafe (69K nodes, 1.6M edges) +</ul> + +Some details are given below: +<ul> + +<li> +The EC-SBM networks are public: +<a href="https://databank.illinois.edu/datasets/IDB-3284069">(link)</a>. +Use the "sbm+wcc" versions for this experiment. +The largest is only around 1.4M nodes. +<li> +For methods, run Leiden optimizing modularity and Leiden optimizing CPM, and any other methods you wish to explore. +For Leiden optimizing CPM, try resolution value 0.1 or 0.01. +Use the leidenalg package <a href="https://leidenalg.readthedocs.io/en/stable/index.html">(link)</a>. +You may also be interested in running Infomap +<a href="https://www.mapequation.org/infomap/">(link)</a> or graph-tool +for SBM <a href="https://graph-tool.skewed.de/static/docs/stable/demos/inference/inference.html">(link)</a>. +<li> +To evaluate accuracy, report AMI, ARI, and NMI, using our +scripts +<a href="http://github.com/illinois-or-research-analytics/network_evaluation">(link)</a>. +<li> +Besides accuracy, report percentage of nodes in non-singleton clusters (what +we refer to as "node coverage"). +Also report statistics about the distributions of the cluster density +and edge connectivity (e.g., perhaps the ratio between the size of the +minimum edge cut and log10(n)). +It is up to you what you report, but report what you find interesting. +You can find scripts for some of these at the same URL given above. +</ul> + +<p> +General recommendations. +<ul> +<li> +You may find it helpful to examine papers that have used these methods to see +what commands were used and how they report their analyses, to understand +reproducibility expectations. +Specifically, examine the supplementary materials documents for the +following papers, as these provide some helpful details. +<ul> +<li> +M. Park et al. "Well-connectedness and community detection". +PLOS Complex Systems, 2024. +<a href="https://doi.org/10.1371/journal.pcsy.0000009">(link)</a> +<li> +T. Vu-Le et al. "Using Stochastic Block Models for Community +Detection". Applied Network Science Vol 11, article 2. +https://doi.org/10.1007/s41109-025-00747-2. +<a href="https://link.springer.com/article/10.1007/s41109-025-00747-2">(link)</a> +</ul> +<li> +If you have trouble with anything in this analysis, +let me know early -- but most likely you can figure it out yourself. +It's best to start this as early as possible (i.e., before Feb 7) to make sure +you know how to do everything. +</ul> +<p> Writing advice +<ul> +<li> +This homework involves not only doing the analyses but writing it up +in a way that reflects your understanding that you gain from the experiment, +as well as enabling reproducibility (so that the reader can +repeat your experiment exactly). +Therefore, give yourself at least a few days for writing; don't wait until +the last day to finish experiments! +<li> +The grade will be based on reproducibility +For this project, it's important to write up your work in a way that +allows for reproducibility. +This is not about grammar and spelling, etc., so once again +write this without assistive AI. +</ul> + +<p> +Grading +<ul> +<li> 25% reproducibility +<li> 25% figures or tables showing results +<li> 50% discussion of results +<li> Up to an additional 10 points for extra work beyond the minimum +</ul> +Note: to receive full points, you must do at least the minimum +required (the two community detection methods as indicated above, +analyses of at least two EC-SBM networks based on SBM+WCC parameters, +and reporting AMI, ARI, and NMI accuracy). +Anything beyond this can early extra credit. + + |
