summaryrefslogtreecommitdiff
path: root/assignment.html
diff options
context:
space:
mode:
Diffstat (limited to 'assignment.html')
-rw-r--r--assignment.html112
1 files changed, 112 insertions, 0 deletions
diff --git a/assignment.html b/assignment.html
new file mode 100644
index 0000000..88f2c2c
--- /dev/null
+++ b/assignment.html
@@ -0,0 +1,112 @@
+This homework involves data analysis (running community detection
+methods
+on synthetic networks) and writing a report about what
+you did and what you observed.
+You should also relate the results you find to other papers
+you have read or heard about (e.g., in class lectures): are they
+similarto what you saw or are they different?
+What did you learn?
+What are you still trying to figure out?
+
+Here are some basic instructions.
+Perform community detection using at least two methods
+on at least two EC-SBM networks based on SBM+WCC input parameters.
+One of the networks should be relatively small (under 10,000 nodes)
+and the others should be at least 30,000 nodes.
+The-Anh suggests these:
+<ul>
+<li>
+topology (35K nodes, 171K edges)
+<li>
+internet_as (23K nodes, 48K edges)
+<li>
+marker_cafe (69K nodes, 1.6M edges)
+</ul>
+
+Some details are given below:
+<ul>
+
+<li>
+The EC-SBM networks are public:
+<a href="https://databank.illinois.edu/datasets/IDB-3284069">(link)</a>.
+Use the "sbm+wcc" versions for this experiment.
+The largest is only around 1.4M nodes.
+<li>
+For methods, run Leiden optimizing modularity and Leiden optimizing CPM, and any other methods you wish to explore.
+For Leiden optimizing CPM, try resolution value 0.1 or 0.01.
+Use the leidenalg package <a href="https://leidenalg.readthedocs.io/en/stable/index.html">(link)</a>.
+You may also be interested in running Infomap
+<a href="https://www.mapequation.org/infomap/">(link)</a> or graph-tool
+for SBM <a href="https://graph-tool.skewed.de/static/docs/stable/demos/inference/inference.html">(link)</a>.
+<li>
+To evaluate accuracy, report AMI, ARI, and NMI, using our
+scripts
+<a href="http://github.com/illinois-or-research-analytics/network_evaluation">(link)</a>.
+<li>
+Besides accuracy, report percentage of nodes in non-singleton clusters (what
+we refer to as "node coverage").
+Also report statistics about the distributions of the cluster density
+and edge connectivity (e.g., perhaps the ratio between the size of the
+minimum edge cut and log10(n)).
+It is up to you what you report, but report what you find interesting.
+You can find scripts for some of these at the same URL given above.
+</ul>
+
+<p>
+General recommendations.
+<ul>
+<li>
+You may find it helpful to examine papers that have used these methods to see
+what commands were used and how they report their analyses, to understand
+reproducibility expectations.
+Specifically, examine the supplementary materials documents for the
+following papers, as these provide some helpful details.
+<ul>
+<li>
+M. Park et al. "Well-connectedness and community detection".
+PLOS Complex Systems, 2024.
+<a href="https://doi.org/10.1371/journal.pcsy.0000009">(link)</a>
+<li>
+T. Vu-Le et al. "Using Stochastic Block Models for Community
+Detection". Applied Network Science Vol 11, article 2.
+https://doi.org/10.1007/s41109-025-00747-2.
+<a href="https://link.springer.com/article/10.1007/s41109-025-00747-2">(link)</a>
+</ul>
+<li>
+If you have trouble with anything in this analysis,
+let me know early -- but most likely you can figure it out yourself.
+It's best to start this as early as possible (i.e., before Feb 7) to make sure
+you know how to do everything.
+</ul>
+<p> Writing advice
+<ul>
+<li>
+This homework involves not only doing the analyses but writing it up
+in a way that reflects your understanding that you gain from the experiment,
+as well as enabling reproducibility (so that the reader can
+repeat your experiment exactly).
+Therefore, give yourself at least a few days for writing; don't wait until
+the last day to finish experiments!
+<li>
+The grade will be based on reproducibility
+For this project, it's important to write up your work in a way that
+allows for reproducibility.
+This is not about grammar and spelling, etc., so once again
+write this without assistive AI.
+</ul>
+
+<p>
+Grading
+<ul>
+<li> 25% reproducibility
+<li> 25% figures or tables showing results
+<li> 50% discussion of results
+<li> Up to an additional 10 points for extra work beyond the minimum
+</ul>
+Note: to receive full points, you must do at least the minimum
+required (the two community detection methods as indicated above,
+analyses of at least two EC-SBM networks based on SBM+WCC parameters,
+and reporting AMI, ARI, and NMI accuracy).
+Anything beyond this can early extra credit.
+
+