Introduction to Incidentally

Zachary Neal

Table of Contents

  1. Introduction
    1. Welcome
    2. What are incidence matrices?
    3. Loading the package
    4. Package overview
    5. Supported data formats
  2. Fill and marginal constraints
    1. Fill/Density
    2. Marginal sums
    3. Marginal distributions
  3. Generative models
    1. Teams model
    2. Groups model
    3. Blau Space model
  4. Block models

Introduction

Welcome

Thank you for your interest in the incidentally package! The incidentally package is designed to generate random incidence matrices and bipartite graphs under different constraints or using different generative models.

For additional resources on the incidentally package, please see https://www.rbackbone.net/.

If you have questions about the incidentally package or would like an incidentally hex sticker, please contact the maintainer Zachary Neal by email (zpneal@msu.edu) or via Twitter (@zpneal). Please report bugs in the backbone package at https://github.com/zpneal/incidentally/issues.

What are incidence matrices?

An incidence matrix is a binary \(r \times c\) matrix I that records associations between \(r\) objects represented by rows and \(c\) objects represented by columns. In this matrix, \(I_{ij} = 1\) if the ith row object is associated with the jth column object, and otherwise \(I_{ij} = 0\). An incidence matrix can be used to represent a bipartite, two-mode, or affiliation network/graph, in which the rows represent one type of node, and the columns represent another type of node (e.g., people who author papers, species living in habitats) (Latapy, Magnien, and Del Vecchio 2008). An incidence matrix can also represent a hypergraph, in which each column represents a hyperedge and identifies the nodes that it connects.

For example: \[I = \begin{bmatrix} 1 & 0 & 1 & 0 & 1\\ 0 & 1 & 1 & 1 & 1\\ 0 & 1 & 0 & 1 & 0 \end{bmatrix} \] is a \(3 \times 5\) incidence matrix that represents the associations of the three row objects with the five column objects. If the rows represent people and the columns represent papers they wrote, then \(I_{1,1} = 1\) indicates that person 1 wrote paper 1, while \(I_{1,2} = 0\) indicates that person 1 did not write paper 2. One key property of an incidence matrix is its marginals, or when the matrix represents a bipartite network, its degree sequences. In this example, the row marginals are \(R = \{3,4,2\}\), and the column marginals are \(C = \{1,2,2,2,2\}\).

Loading the package

The incidentally package can be loaded in the usual way:

set.seed(5)
library(incidentally)
#> O  O  O  incidentally v0.9.0
#> |\ | /|  Cite: Neal, Z. P. (2021). incidentally: An R package for generating incidence
#> |  |  |        matrices and bipartite graphs.
#> |/ | \|  Help: type vignette("incidentally"); email zpneal@msu.edu; github zpneal/incidentally
#> X  X  X  Beta: type devtools::install_github("zpneal/incidentally", ref = "devel")

Upon successful loading, a startup message will display that shows the version number, citation, ways to get help, and ways to contact me. Here, we also set.seed(5) to ensure that the examples below are reproducible.

Package overview

The incidentally package offers multiple incidence matrix-generating functions that differ in how the resulting incidence matrix is constrained. These functions are described in detail below, but briefly:

Once an incidence matrix is generated using one of these functions, the add.blocks() function can be used to add a block structure or planted partition.

Supported data formats

The incidentally package can return incidence matrices in several data formats that are useful for subsequent analysis in R:

back to Table of Contents

Fill and marginal constraints

Fill/Density

The incidence.from.probability() function generates an incidence matrix with a given probabaility \(p\) that \(I_{ij} = 1\), and thus an overall fill rate or density of approximately \(p\). We can use it to generate a \(10 \times 10\) incidence matrix in which \(Pr(I_{ij} = 1) = .2\):

I <- incidence.from.probability(10, 10, .2)
I
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    0    0    1    0    0    0    0    1    0     0
#>  [2,]    0    0    0    0    0    1    0    1    0     0
#>  [3,]    0    0    0    0    0    0    1    1    0     0
#>  [4,]    0    0    0    0    0    0    0    0    1     1
#>  [5,]    1    1    0    1    1    0    0    0    0     0
#>  [6,]    0    0    0    1    0    0    1    0    0     0
#>  [7,]    0    0    0    0    0    0    0    0    1     0
#>  [8,]    0    0    0    0    0    0    0    0    0     1
#>  [9,]    0    0    0    0    0    0    0    1    0     0
#> [10,]    0    0    0    0    0    0    1    0    0     0
mean(I)  #Fill rate/Density
#> [1] 0.18

By default, incidence.from.probability() only generates incidence matrices in which no rows or columns are completely empty or full. We can relax this constraint, allowing some rows/columns to contain all 0s or all 1s by specifying constrain = FALSE:

I <- incidence.from.probability(10, 10, .2, constrain = FALSE)
I
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    0    0    0    1    0    0    0    0    0     0
#>  [2,]    0    0    0    1    0    0    1    0    0     0
#>  [3,]    0    0    0    0    0    0    0    1    0     1
#>  [4,]    0    0    0    0    1    0    0    1    0     0
#>  [5,]    0    0    1    0    0    0    0    0    0     1
#>  [6,]    0    0    0    0    0    0    1    0    1     0
#>  [7,]    0    0    0    0    1    0    1    1    0     0
#>  [8,]    0    0    0    0    1    0    0    0    0     0
#>  [9,]    0    0    0    0    1    0    0    0    0     0
#> [10,]    1    0    0    0    1    1    1    0    0     0
mean(I)  #Fill rate/Density
#> [1] 0.2

back to Table of Contents

Marginal sums

The incidence.from.vector() function generates an incidence matrix with given row and column marginals. The generated incidence matrix represents a random draw from the space of all such matrices. We can use it to generate a random incidence matrix with \(R = \{3,4,2\}\) and \(C = \{1,2,2,2,2\}\):

I <- incidence.from.vector(c(4,3,2), c(1,2,2,2,2))
I
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    1    0    1
#> [2,]    0    0    1    1    1
#> [3,]    0    1    0    1    0
rowSums(I)  #Row marginals
#> [1] 4 3 2
colSums(I)  #Column marginals
#> [1] 1 2 2 2 2

back to Table of Contents

Marginal distributions

The incidence.from.distributions() function generates an incidence matrix in which the row marginals approximately follow a given Beta distribution, and the column marginals approximately follow a given Beta distribution, described by two shape parameters. Beta distributions are used because they can flexibly capture many different distributional shapes:

A \(100 \times 100\) incidence matrix with uniformly distributed row and column marginals:

I <- incidence.from.distribution(R = 100, C = 100, P = 0.2,
  rowdist = c(1,1), coldist = c(1,1))
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")

A \(100 \times 100\) incidence matrix with right-tail distributed row and column marginals:

I <- incidence.from.distribution(R = 100, C = 100, P = 0.2,
  rowdist = c(1,10), coldist = c(1,10))
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")

A \(100 \times 100\) incidence matrix with left-tail distributed row and column marginals:

I <- incidence.from.distribution(R = 100, C = 100, P = 0.2,
  rowdist = c(10,1), coldist = c(10,1))
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")

A \(100 \times 100\) incidence matrix with normally distributed row and column marginals:

I <- incidence.from.distribution(R = 100, C = 100, P = 0.2,
  rowdist = c(10,10), coldist = c(10,10))
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")

A \(100 \times 100\) incidence matrix with constant row and column marginals:

I <- incidence.from.distribution(R = 100, C = 100, P = 0.2,
  rowdist = c(10000,10000), coldist = c(10000,10000))
rowSums(I)
#>   [1] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#>  [26] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#>  [51] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#>  [76] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
colSums(I)
#>   [1] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#>  [26] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#>  [51] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#>  [76] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

Of course, different types of Beta distributions can be combined. For example, we can generate a \(100 \times 100\) incidence matrix in which the row marginals are right-tailed, but the column marginals are left-tailed:

I <- incidence.from.distribution(R = 100, C = 100, P = 0.2,
  rowdist = c(1,10), coldist = c(10,1))
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")

back to Table of Contents

Generative models

Focus theory suggests that social networks form, in part, because individuals share foci such as shared activities that create opportunity for interaction (Feld 1981). Individuals’ memberships in foci can be represented by an incidence matrix or bipartite network. The social network that may emerge from these foci memberships can be obtained via bipartite projection, which yields an adjacency matrix or unipartite network in which people are connected by shared foci (Breiger 1974; Neal 2014).

Focus theory therefore explains how incidence/bipartite \(\rightarrow\) adjacency/unipartite. However, it is also possible that individuals’ interactions in a social network can lead to the formation of new foci. That is, it is possible that adjacency/unipartite \(\rightarrow\) incidence/bipartite. The incidence.from.adjacency() function implements three generative models (model = c("team", "group", "blau")) that reflect different ways that this might occur.

Teams model

The teams model mirrors a team formation process (Guimera et al. 2005) that depends on the structure of a given network in which cliques represent prior teams. Each column in the incidence matrix records the members of a new team that is formed from the incumbants of a randomly selected prior team (with probability \(p\)) and newcomers (with probability \(1-p\)).

Given an initial social network among 15 people, we can simulate their formation of three (k = 3) new teams, where there is a p = 0.75 probability that a prior team member joins the a new team:

G <- erdos.renyi.game(15, .5)  #A random social network of 15 people, as igraph
I <- incidence.from.adjacency(G, k = 3, p = .75, model = "team")  #Teams model
class(I)  #Incidence matrix returned as igraph object
#> [1] "igraph"
V(I)$shape <- ifelse(V(I)$type, "square", "circle")  #Add shapes
plot(G, main="Social Network")
plot(I, layout = layout_as_bipartite(I), main="New Teams")

Notice that because the social network G is supplied as a igraph object, the generated incidence matrix I is returned as an igraph bipartite network, which facilitates subsequent plotting and analysis. In this example, team 16 is formed by 1, 8, 9, and 10. This team may have emerged from the prior 4-member team of 1, 6, 8, 10 (they are a clique in the social network). In this case, three positions on the new team are filled by incumbents from the original team (1, 8, and 10), while the final position is filled by a newcomer (9).

back to Table of Contents

Groups model

The groups model mirrors a social group formation process (Backstrom et al. 2006) in which current group members try to recruit their friends. To ensure a minimum level of group cohesion, potential recruits join the group only if doing so would yield a new group in which the members’ social ties have a density of at least \(p\). Each column in the incidence matrix records the members of a new social group.

Given an initial social network among 15 people, we can simulate their formation of three (k = 3) new groups, where each group has a minimum density of p = 0.75:

G <- erdos.renyi.game(15, .33)  #A random social network of 15 people, as igraph
I <- incidence.from.adjacency(G, k = 3, p = .75, model = "group")  #Groups model
V(I)$shape <- ifelse(V(I)$type, "square", "circle")  #Add shapes
plot(G, main="Social Network")
plot(I, layout = layout_as_bipartite(I), main="New Groups")

In this example, group 18 is joined by 2, 4, 6, and 13. This group may have formed when the initial dyad of 2 & 4 attempted to recruit their friend 6. Person 6 would join because doing so would create a new group with a density of 1 (because 2, 4, and 6 are all connected), which is greater than 0.75. Next, 6 recruits 13. Person 13 would join because doing so would create a new group with a density of 0.83, which is greater than 0.75. Next, 13 recruits 9. Person 9 would not join because doing so would create a new group with a density of 0.6, which is less than 0.75.

back to Table of Contents

Blau Space model

The Blau Space model mirrors an organizational recruitment process (McPherson 1983). The given social network is embedded in a \(d\) dimensional social space in which the dimensions are assumed to represent meaningful social distinctions, such that socially similar people are positioned nearby. Organizations recruit members from this space, recruiting people inside their niche with probability \(p\), and outside their niche with probability \(1-p\). Each column in the incidence matrix records the members of a new organization.

Given a social network among 15 people, we can simulate their recruitment by three (k = 3) new organizations, where there is a p = 0.9 probability that an individual inside an organization’s two-dimensional (d = 2) niche becomes a member:

G <- erdos.renyi.game(15, .33)  #A random social network of 15 people, as igraph
I <- incidence.from.adjacency(G, k = 3, d = 2, p = .90, model = "blau")  #Groups model
V(I)$shape <- ifelse(V(I)$type, "square", "circle")  #Add shapes
plot(G, layout = layout_with_mds(G), main="Social Network")
plot(I, layout = layout_as_bipartite(I), main="New Organizations")

The social network is plotted using a Multidimensional Scaling layout, and therefore shows the nodes’ positions in the abstract Blau Space from which organizations recruit members. In this example, organization 16 recruits 1, 3, 5, and 7 as members. This organization’s niche is located near the top of the space. People 1, 3, and 5 were likely inside its niche and therefore readily recruited (here, with a 90% probability). Although person 7 is outside it’s niche (i.e., quite different from the organization’s typical member), they were also recruited.

back to Table of Contents

Block models

The add.blocks() function shuffles an incidence matrix to have a block structure or planted partition while preserving the row and column marginals. For example, after generating an incidence matrix with a density of .5, we can plant a two-group (block = 2) partition in which the within-group density = 0.8:

I <- incidence.from.probability(10, 10, .3)
I
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    1    1    1    1    0    1    1    0    1     0
#>  [2,]    0    0    0    0    0    0    0    1    0     0
#>  [3,]    0    1    1    0    1    1    0    1    1     0
#>  [4,]    0    0    1    1    0    1    0    0    1     1
#>  [5,]    1    0    0    1    0    0    0    0    0     0
#>  [6,]    0    0    1    1    0    1    1    0    1     0
#>  [7,]    1    0    0    0    1    0    1    0    0     0
#>  [8,]    0    0    0    1    0    0    0    1    0     0
#>  [9,]    0    0    1    1    1    0    0    0    0     0
#> [10,]    0    0    0    0    0    0    0    1    1     0
rowSums(I)
#>  [1] 7 1 6 5 2 5 3 2 3 2
colSums(I)
#>  [1] 3 2 5 6 3 4 3 4 5 1
I <- add.blocks(I, blocks = 2, density = .8)
#> 
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===========================================================           |  84%
I
#>     A1 A10 A6 A7 A8 A9 B2 B3 B4 B5
#> A10  0   0  0  0  1  1  0  0  0  0
#> A4   0   1  1  0  1  1  0  0  1  0
#> A6   1   0  1  1  1  1  0  0  0  0
#> A9   1   0  0  0  1  1  0  0  0  0
#> B1   1   0  1  1  0  0  1  1  1  1
#> B2   0   0  0  0  0  0  0  0  1  0
#> B3   0   0  1  0  0  1  1  1  1  1
#> B5   0   0  0  0  0  0  0  1  1  0
#> B7   0   0  0  1  0  0  0  1  0  1
#> B8   0   0  0  0  0  0  0  1  1  0
rowSums(I)
#> A10  A4  A6  A9  B1  B2  B3  B5  B7  B8 
#>   2   5   5   3   7   1   6   2   3   2
colSums(I)
#>  A1 A10  A6  A7  A8  A9  B2  B3  B4  B5 
#>   3   1   4   3   4   5   2   5   6   3

In this example, the row objects are partitioned into two randomly-sized groups labeled A and B, and similarly the column objects are partitioned into two randomly-sized groups labeled A and B. There is a 0.8 probability of a 1 occuring between a row and column object belonging to the same group. Note that the row and column marginals are preserved.

back to Table of Contents

References

Backstrom, Lars, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. “Group Formation in Large Social Networks: Membership, Growth, and Evolution.” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 44–54. https://doi.org/10.1145/1150402.1150412.
Breiger, Ronald L. 1974. “The Duality of Persons and Groups.” Social Forces 53 (2): 181–90. https://doi.org/10.1093/sf/53.2.181.
Feld, Scott L. 1981. “The Focused Organization of Social Ties.” American Journal of Sociology 86 (5): 1015–35. https://doi.org/10.1086/227352.
Guimera, Roger, Brian Uzzi, Jarrett Spiro, and Luis A Nunes Amaral. 2005. “Team Assembly Mechanisms Determine Collaboration Network Structure and Team Performance.” Science 308 (5722): 697–702. https://doi.org/10.1126/science.1106340.
Latapy, Matthieu, Clémence Magnien, and Nathalie Del Vecchio. 2008. “Basic Notions for the Analysis of Large Two-Mode Networks.” Social Networks 30 (1): 31–48. https://doi.org/10.1016/j.socnet.2007.04.006.
McPherson, Miller. 1983. “An Ecology of Affiliation.” American Sociological Review, 519–32. https://doi.org/10.2307/2117719.
Neal, Zachary P. 2014. “The Backbone of Bipartite Projections: Inferring Relationships from Co-Authorship, Co-Sponsorship, Co-Attendance and Other Co-Behaviors.” Social Networks 39 (October): 84–97. https://doi.org/10.1016/j.socnet.2014.06.001.