Merge branch 'release-0.5.2'

zhaoweikb · Mar 5, 2015 · 79ab87c · 79ab87c
2 parents 9e9f80a + dd4296b
commit 79ab87c
Show file tree

Hide file tree

Showing 11 changed files with 160 additions and 53 deletions.
diff --git a/CHANGELOG b/CHANGELOG
@@ -0,0 +1,11 @@
+0.5.2
+- Ensured that random neighbour selection works in O(1) rather than O(k), with k the average number of neighbours.
+- Optimized the calculation of weight from/to community.
+- Included some missing references.
+
+0.5.1
+Corrected some mistakes which prevented it from being properly used on PyPi.
+No serious changes were made.
+
+0.5
+Initial release
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,4 +1,5 @@
 include LICENSE
 include INSTALL
+include CHANGELOG
 include README.md
 recursive-include include *.h
diff --git a/README.md b/README.md
@@ -1,42 +1,44 @@
 INTRODUCTION
 ============
 
-This package implements the louvain algorithm in ``C++`` and exposes it to
+This package implements the louvain algorithm [1] in ``C++`` and exposes it to
 ``python``. It relies on (python-)``igraph`` for it to function. Besides the
 relative flexibility of the implementation, it also scales well, and can be run
 on graphs of millions of nodes (as long as they can fit in memory). The core
 function is ``find_partition`` which finds the optimal partition using the
-louvain algorithm for a number of different methods. The methods currently
-implemented are:
+louvain algorithm for a number of different methods. The original implementation
+is available from https://sites.google.com/site/findcommunities/. The methods
+currently implemented are:
 
 Modularity
   This method compares the actual graph to the expected graph, taking into
-  account the degree of the nodes [1]. The expected graph is based on a
+  account the degree of the nodes [2]. The expected graph is based on a
   configuration null-model.
 
 RBConfiguration
-  This is an extension of modularity which includes a resolution parameter [2].
+  This is an extension of modularity which includes a resolution parameter [3].
   In general, a higher resolution parameter will lead to smaller communities.
 
 RBER
   A variant of the previous method that instead of a configuration null-model
   uses a Erdös-Rényi null-model in which each edge has the same probability of
-  appearing [2].
+  appearing [3].
 
 CPM
   This method compares to a fixed resolution parameter, so that it finds
   communities that have an internal density higher than the resolution
   parameter, and is separated from other communities with a density lowerer than
-  the resolution parameter [3].
+  the resolution parameter [4].
 
 Significance
   This is a probabilistic method based on the idea of assessing the probability
-  of finding such dense subgraphs in an (ER) random graph [4].
+  of finding such dense subgraphs in an (ER) random graph [5].
 
 Surprise
   Another probabilistic method, but rather than the probability of finding dense
   subgraphs, it focuses on the probability of so many edges within communities
-  [5, 6].
+  [6, 7].
+
 
 INSTALLATION
 ============
@@ -115,7 +117,7 @@ sum over all layers, weighted by some weight. If we denote by ``q_k`` the qualit
 of layer ``k`` and the weight by ``w_k``, the overall quality is then ``q = sum_k
 w_k q_k``.  This can also be useful in case you have negative links. In
 principle, this could also be used to detect temporal communities in a dynamic
-setting, cf. [7].
+setting, cf. [8].
 
 For example, assuming you have a graph with positive weights ``G_positive`` and
 a graph with negative weights ``G_negative``, and you want to use Modularity for
@@ -135,7 +137,7 @@ the partition. Notice that this runs much slower than only considering
 neighbouring communities (which is the default).
 
 Various methods (such as Reichardt and Bornholdt's Potts model, or CPM) support
-a (linear) resolution parameter, which can be effectively bisected, cf. [4]. You
+a (linear) resolution parameter, which can be effectively bisected, cf. [5]. You
 can do this by calling:
 ```python
 res_parts = louvain.bisect(G, method='CPM', resolution_range=[0,1]);
@@ -164,18 +166,20 @@ REFERENCES
 
 Please cite the references appropriately in case they are used.
 
-1. Newman, M. & Girvan, M. Finding and evaluating community structure in networks.
+1. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding
+   of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
+2. Newman, M. & Girvan, M. Finding and evaluating community structure in networks.
    Physical Review E 69, 026113 (2004).
-2. Reichardt, J. & Bornholdt, S. Partitioning and modularity of graphs with arbitrary
+3. Reichardt, J. & Bornholdt, S. Partitioning and modularity of graphs with arbitrary
    degree distribution. Physical Review E 76, 015102 (2007).
-3. Traag, V. A., Van Dooren, P. & Nesterov, Y. Narrow scope for resolution-limit-free
+4. Traag, V. A., Van Dooren, P. & Nesterov, Y. Narrow scope for resolution-limit-free
    community detection. Physical Review E 84, 016114 (2011).
-4. Traag, V. A., Krings, G. & Van Dooren, P. Significant scales in community structure.
+5. Traag, V. A., Krings, G. & Van Dooren, P. Significant scales in community structure.
    Scientific Reports 3, 2930 (2013).
-5. Aldecoa, R. & Marín, I. Surprise maximization reveals the community structure
+6. Aldecoa, R. & Marín, I. Surprise maximization reveals the community structure
    of complex networks. Scientific reports 3, 1060 (2013).
-6. Traag, V.A., Aldecoa, R. & Delvenne, J.-C. Detecting communities using Asymptotical
+7. Traag, V.A., Aldecoa, R. & Delvenne, J.-C. Detecting communities using Asymptotical
    Surprise. Forthcoming (2015).
-7. Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. & Onnela, J.-P.
+8. Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. & Onnela, J.-P.
    Community structure in time-dependent, multiscale, and multiplex networks.
    Science 328, 876–8 (2010).
diff --git a/include/GraphHelper.h b/include/GraphHelper.h
@@ -80,6 +80,16 @@ class Graph
       get_neighbour_edges(size_t v, igraph_neimode_t mode);
     vector< size_t >*
       get_neighbours(size_t v, igraph_neimode_t mode);
+    size_t get_random_neighbour(size_t v, igraph_neimode_t mode);
+    inline size_t get_random_node()
+    {
+      return this->get_random_int(0, this->vcount() - 1);
+    };
+
+    inline size_t get_random_int(size_t from, size_t to)
+    {
+      return igraph_rng_get_integer(igraph_rng_default(), from, to);
+    };
 
     inline size_t vcount() { return igraph_vcount(this->_graph); };
     inline size_t ecount() { return igraph_ecount(this->_graph); };
@@ -161,6 +171,7 @@ class Graph
     void set_default_edge_weight();
     void set_default_node_size();
     void set_self_weights();
+
 };
 
 // We need this ugly way to include the MutableVertexPartition

diff --git a/include/Optimiser.h b/include/Optimiser.h
@@ -30,14 +30,14 @@ class Optimiser
     double optimize_partition(MutableVertexPartition* partition);
     template <class T> T* find_partition(Graph* graph);
     template <class T> T* find_partition(Graph* graph, double resolution_parameter);
-    double move_nodes(MutableVertexPartition* partition);
+    double move_nodes(MutableVertexPartition* partition, int consider_comms);
 
     // The multiplex functions that simultaneously optimize multiple graphs and partitions (i.e. methods)
     // Each node will be in the same community in all graphs, and the graphs are expected to have identical nodes
     // Optionally we can loop over all possible communities instead of only the neighbours. In the case of negative
     // layer weights this may be necessary.
     double optimize_partition(vector<MutableVertexPartition*> partitions, vector<double> layer_weights);
-    double move_nodes(vector<MutableVertexPartition*> partitions, vector<double> layer_weights);
+    double move_nodes(vector<MutableVertexPartition*> partitions, vector<double> layer_weights, int consider_comms);
 
     virtual ~Optimiser();
 

diff --git a/setup.py b/setup.py
@@ -579,7 +579,7 @@ def read(fname):
 
 options =  dict(
   name = 'louvain',
-  version = '0.5.1',
+  version = '0.5.2',
   description = 'Louvain is a general algorithm for methods of community detection in large networks.',
   long_description=read('README.md'),
   license = 'GPLv3+',

diff --git a/setup_ms.py b/setup_ms.py
@@ -44,7 +44,7 @@ def read(fname):
 
 options =  dict(
   name = 'louvain',
-  version = '0.5.1',
+  version = '0.5.2',
   description = 'Louvain is a general algorithm for methods of community detection in large networks.',
   long_description=read('README.md'),
   license = 'GPLv3+',

diff --git a/src/GraphHelper.cpp b/src/GraphHelper.cpp
@@ -232,6 +232,7 @@ void Graph::set_self_weights()
 
 void Graph::init_admin()
 {
+
   size_t m = this->ecount();
 
   // Determine total weight in the graph.
@@ -343,14 +344,8 @@ double Graph::weight_tofrom_community(size_t v, size_t comm, vector<size_t>* mem
   igraph_neighbors(this->_graph, &neighbours, v, mode);
   for (size_t i = 0; i < degree; i++)
   {
-    size_t e = VECTOR(incident_edges)[i];
     size_t u = VECTOR(neighbours)[i];
 
-    // Get the weight of the edge
-    double w = this->_edge_weights[e];
-    // Self loops appear twice here if the graph is undirected, so divide by 2.0 in that case.
-    if (u == v && !this->is_directed())
-        w /= 2.0;
     // If it is an edge to the requested community
     #ifdef DEBUG
       size_t u_comm = (*membership)[u];
@@ -360,12 +355,19 @@ double Graph::weight_tofrom_community(size_t v, size_t comm, vector<size_t>* mem
       #ifdef DEBUG
         cerr << "\t" << "Sum edge (" << v << "-" << u << "), Comm (" << comm << "-" << u_comm << ") weight: " << w << "." << endl;
       #endif
+      size_t e = VECTOR(incident_edges)[i];
+      // Get the weight of the edge
+      double w = this->_edge_weights[e];
+      // Self loops appear twice here if the graph is undirected, so divide by 2.0 in that case.
+      if (u == v && !this->is_directed())
+          w /= 2.0;
+
       total_w += w;
     }
     #ifdef DEBUG
     else
     {
-      cerr << "\t" << "Ignore edge (" << v << "-" << u << "), Comm (" << comm << "-" << u_comm << ") weight: " << w << "." << endl;
+      cerr << "\t" << "Ignore edge (" << v << "-" << u << "), Comm (" << comm << ") weight: " << this->_edge_weights[VECTOR(incident_edges)[i]] << "." << endl;
     }
     #endif
   }
@@ -415,6 +417,79 @@ Graph::get_neighbours(size_t v, igraph_neimode_t mode)
   return neighs;
 }
 
+/********************************************************************************
+ * This should return a random neighbour in O(1)
+ ********************************************************************************/
+size_t Graph::get_random_neighbour(size_t v, igraph_neimode_t mode)
+{
+  size_t node=v;
+  size_t rand_neigh = -1;
+
+  if (this->degree(v, mode) <= 0)
+    throw Exception("Cannot select a random neighbour for an isolated node.");
+
+  if (igraph_is_directed(this->_graph) && mode != IGRAPH_ALL)
+  {
+    if (mode == IGRAPH_OUT)
+    {
+      // Get indices of where neighbours are
+      size_t cum_degree_this_node = (size_t) VECTOR(this->_graph->os)[node];
+      size_t cum_degree_next_node = (size_t) VECTOR(this->_graph->os)[node+1];
+      // Get a random index from them
+      size_t rand_neigh_idx = igraph_rng_get_integer(igraph_rng_default(), cum_degree_this_node, cum_degree_next_node - 1);
+      // Return the neighbour at that index
+      #ifdef DEBUG
+        cerr << "Degree: " << this->degree(node, mode) << " diff in cumulative: " << cum_degree_next_node - cum_degree_this_node << endl;
+      #endif
+      rand_neigh = VECTOR(this->_graph->to)[ (size_t)VECTOR(this->_graph->oi)[rand_neigh_idx] ];
+    }
+    else if (mode == IGRAPH_IN)
+    {
+      // Get indices of where neighbours are
+      size_t cum_degree_this_node = (size_t) VECTOR(this->_graph->is)[node];
+      size_t cum_degree_next_node = (size_t) VECTOR(this->_graph->is)[node+1];
+      // Get a random index from them
+      size_t rand_neigh_idx = igraph_rng_get_integer(igraph_rng_default(), cum_degree_this_node, cum_degree_next_node - 1);
+      #ifdef DEBUG
+        cerr << "Degree: " << this->degree(node, mode) << " diff in cumulative: " << cum_degree_next_node - cum_degree_this_node << endl;
+      #endif
+      // Return the neighbour at that index
+      rand_neigh = VECTOR(this->_graph->from)[ (size_t)VECTOR(this->_graph->ii)[rand_neigh_idx] ];
+    }
+  }
+  else
+  {
+    // both in- and out- neighbors in a directed graph.
+    size_t cum_outdegree_this_node = (size_t)VECTOR(this->_graph->os)[node];
+    size_t cum_indegree_this_node  = (size_t)VECTOR(this->_graph->is)[node];
+
+    size_t cum_outdegree_next_node = (size_t)VECTOR(this->_graph->os)[node+1];
+    size_t cum_indegree_next_node  = (size_t)VECTOR(this->_graph->is)[node+1];
+
+    size_t total_outdegree = cum_outdegree_next_node - cum_outdegree_this_node;
+    size_t total_indegree = cum_indegree_next_node - cum_indegree_this_node;
+
+    size_t rand_idx = igraph_rng_get_integer(igraph_rng_default(), 0, total_outdegree + total_indegree - 1);
+
+    #ifdef DEBUG
+      cerr << "Degree: " << this->degree(node, mode) << " diff in cumulative: " << total_outdegree + total_indegree << endl;
+    #endif
+    // From among in or out neighbours?
+    if (rand_idx < total_outdegree)
+    { // From among outgoing neighbours
+      size_t rand_neigh_idx = cum_outdegree_this_node + rand_idx;
+      rand_neigh = VECTOR(this->_graph->to)[ (size_t)VECTOR(this->_graph->oi)[rand_neigh_idx] ];
+    }
+    else
+    { // From among incoming neighbours
+      size_t rand_neigh_idx = cum_indegree_this_node + rand_idx - total_outdegree;
+      rand_neigh = VECTOR(this->_graph->from)[ (size_t)VECTOR(this->_graph->ii)[rand_neigh_idx] ];
+    }
+  }
+
+  return rand_neigh;
+}
+
 /****************************************************************************
   Creates a graph with communities as node and links as weights between
   communities.

diff --git a/src/MutableVertexPartition.cpp b/src/MutableVertexPartition.cpp
@@ -246,7 +246,7 @@ void MutableVertexPartition::move_node(size_t v,size_t new_comm)
   size_t node_size = this->graph->node_size(v);
   size_t old_comm = this->_membership[v];
 
-  // Incidentally, this is indepentend of whether we take into account self-loops or not
+  // Incidentally, this is independent of whether we take into account self-loops or not
   // (i.e. whether we count as n_c^2 or as n_c(n_c - 1). Be careful to do this before the
   // adaptation of the community sizes, otherwise the calculations are incorrect.
   _total_possible_edges_in_all_comms += 2.0*node_size*(this->_csize[new_comm] - this->_csize[old_comm] + node_size)/(2.0 - this->graph->is_directed());