bigergm
\[ \mathbb{P}_{\beta}(\mathbf{x}| \mathbf{y}) = \exp\left(\beta^\top \mathbf{s}(\mathbf{x}, \mathbf{y})\right)/ c(\beta, \mathbf{y}) \] where
\[ \begin{split} \mathbb{P}_\theta(\mathbf{x} | \mathbf{y}, \mathbf{z}) = &\left[\prod_{k \neq l}^K \mathbb{P}_{B}(\mathbf{x}_{k,l} | \mathbf{y}, \mathbf{z})\right] \left[\prod_{k = 1}^K \mathbb{P}_{W}(\mathbf{x}_{k,k} | \mathbf{y},\mathbf{z}) \right] \end{split} \] where
\[ \mathbb{P}_{W}(\mathbf{x}_{k,k} | \mathbf{y}, \mathbf{z}) = \exp\left(\alpha^\top \mathbf{s}_W(\mathbf{x}_{k,k}, \mathbf{y})\right)/ c_W(\alpha, \mathbf{y}, \mathbf{z}), \]
where
\[ \mathbb{P}_{B}(\mathbf{x}_{k,l} | \mathbf{y}, \mathbf{z}) = \prod_{(i,j) \text{; } z_{ik} = 1 \text{, } z_{jl} = 1} \mathbb{P}_{\beta}( x_{i,j} \mid \mathbf{y}, \mathbf{z}), \] where \[ \mathbb{P}_{\beta}( x_{i,j} | \mathbf{y},\mathbf{z}) = (\pi_{k,l}(\beta, \mathbf{y}))^{x_{i,j}} (1 - \pi_{k,l}(\beta, \mathbf{y}))^{1 - x_{i,j}} \] e.g.
\[ \pi_{k,l}(\beta, \mathbf{y}) = \text{logit}^{-1}\left(\beta_0 + \sum_{p = 1}^P \beta_p \, \mathbb{I}\left(y_{i,p} = y_{j,p}\right) \right) \]
\[ \mathbb{P}_{B}(\mathbf{x}_{k,l} | \mathbf{y}, \mathbf{z}) = \exp\left(\beta^\top \mathbf{s}_B(\mathbf{x}_{k,l}, \mathbf{y})\right)/ c_B(\beta, \mathbf{y}, \mathbf{z}), \] where \[ \mathbf{s}_B(\mathbf{x}_{k,l}, \mathbf{y}) = \left(\sum_{i<j} x_{i,j},\sum_{i<j} \mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}, ..., \sum_{i<j} \mathbb{I}(y_{P,i} = y_{P,j})\, x_{i,j}\right) \]
Remarks:
Tools
menuCopy to Clipboard
buttonbigergm
bigergm
: Hierarchical exponential-family random graph models for big networksbigergm
: Hierarchical exponential-family random graph models for big networksThe block memberships saved as a vertex attribute ‘block’:
lhs_network
:
network
object where the simulation should be started.network_term(s)
: ergmTerms
specifying the network model.
Within-block model: \[ \mathbf{s}_W(\mathbf{x}_{k,k}, \mathbf{y}) = \left(\sum_{i<j} x_{i,j}, \sum_{i<j} \mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}, \sum_{i<j} \mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}\right) \] Between-block model: \[ \mathbf{s}_B(\mathbf{x}_{k,l}, \mathbf{y}) = \left(\sum_{i<j}x_{i,j}, \sum_{i<j} \mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}, \sum_{i<j} \mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}\right) \]
Within-block model: \[ \begin{split} \mathbf{s}_W(\mathbf{x}_{k,k}, \mathbf{y}) = \Bigg(&\sum_{i<j} x_{i,j}, \sum_{i<j} \mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}, \sum_{i<j} \mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}, \\ &\sum_{i<j} x_{i,j}\,\mathbb{I} \Bigg(\sum_{h \neq i,j} x_{i,h}\, x_{h,j}>0 \Bigg)\Bigg) \end{split} \]
Between-block model: \[ \mathbf{s}_B(\mathbf{x}_{k,l}, \mathbf{y}) = \left(\sum_{i<j}x_{i,j}, \sum_{i<j} \mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}, \sum_{i<j}\mathbb{I}(y_{1,i} = y_{1,j})\, x_{i,j}\right) \]
# Load the required packages
library(bigergm)
library(network)
# Ensure that all results are reproducible
set.seed(123)
# Specify base network (which has 200 nodes and is undirected)
network_tmp <- network::network.initialize(n = 200, directed = FALSE)
# Assign a block membership to each node
network_tmp%v% "block" <- sample(1:4, 200, replace = TRUE)
# Assign a covariate 'x' and 'y' to each node
network_tmp%v% "x" <- sample(1:5, 200, replace = TRUE)
network_tmp%v% "y" <- sample(c("A","B","C"), 200, replace = TRUE)
sim_net <- simulate_bigergm(
# Model specification
formula = model_formula,
# The coefficients for the between connections
coef_between = c(-4.7,0.8, 0.4),
# The coefficients for the within connections
coef_within = c(-2.5,1,1,0.5),
# Number of simulations
nsim = 1,
# Control argument to guide simulation
control_within = ergm::control.simulate.formula(MCMC.burnin = 50000,
MCMC.interval = 10000),
# Ensure that all results are reproducible
seed = 123
)
# Update the formula to refer to the simulated network
model_formula <- update(model_formula, sim_net~.)
res <-bigergm(
# The model you would like to estimate
object = model_formula,
# The number of blocks
n_blocks = 4,
# The maximum number of MM steps
n_MM_step_max = 100,
# The tolarence for the MM algorithm
tol_MM_step = 1e-6,
# Indicate whether clustering should take into account nodematch on x and y
clustering_with_features = TRUE,
# Keep track of block memberships at each iteration
check_block_membership = TRUE
)
Call:
.main()
Found 4 clusters of relative sizes:
0.295 0.275 0.225 0.205
Results of within-cluster estimation:
Maximum Pseudolikelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -2.70189 0.08884 0 -30.41 <1e-04 ***
nodematch.x 0.93662 0.08120 0 11.54 <1e-04 ***
nodematch.y 0.91773 0.07333 0 12.52 <1e-04 ***
transitiveties 0.65987 0.05540 0 11.91 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results of between-cluster estimation:
Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -4.5848 0.1062 0 -43.184 < 1e-04 ***
nodematch.x 0.5452 0.1615 0 3.375 0.000738 ***
nodematch.y 0.1856 0.1529 0 1.214 0.224796
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC: 6709.794 BIC: 6736.106
Ground truth coefficients:
res_known <-bigergm(
# The model you would like to estimate
object = model_formula,
# Specify the block structure
blocks = sim_net %v% "block"
)
summary(res_known)
Call:
.main()
Found 4 clusters of relative sizes:
0.225 0.275 0.29 0.21
Results of within-cluster estimation:
Maximum Pseudolikelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -2.67073 0.08859 0 -30.15 <1e-04 ***
nodematch.x 0.93762 0.08116 0 11.55 <1e-04 ***
nodematch.y 0.91850 0.07332 0 12.53 <1e-04 ***
transitiveties 0.63566 0.05512 0 11.53 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results of between-cluster estimation:
Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -4.5891 0.1061 0 -43.234 < 1e-04 ***
nodematch.x 0.5562 0.1604 0 3.468 0.000524 ***
nodematch.y 0.2138 0.1516 0 1.410 0.158430
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC: 6726.304 BIC: 6752.603
tmp <- gof(res_known,
# Control arguments for within-block simulations
control_within = ergm::control.simulate.formula(MCMC.burnin = 50000,
MCMC.interval = 10000),
# How many simulations should be performed?
nsim = 100,
# Either 'full' or 'within' to say if gof should be based on only
# within-connections or all
type = "full",
# Ensure that all results are reproducible
seed = 123,
# Should the simulation start from the observed network?
start_from_observed = TRUE,
# Should all geodesic distances be computed for the simulated networks?
compute_geodesic_distance = TRUE)
We are indebted Bruce Desmarais for sharing the Twitter (X) data and allowing us to include them in the package:
Gopal, Kim, Nakka, Boehmke, Harden, Desmarais. The National Network of U.S. State Legislators on Twitter. Political Science Research & Methods, Forthcoming.
Kim, Nakka, Gopal, Desmarais, Mancinelli, Harden, Ko, and Boehmke (2022). Attention to the COVID-19 pandemic on Twitter: Partisan differences among U.S. state legislators. Legislative Studies Quarterly 47, 1023–1041.
Network attributes:
vertices = 2191
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 33321
missing edges= 0
non-missing edges= 33321
Vertex attribute names:
gender party race state vertex.names
Edge attribute names not shown
ergm.multi
: Size-dependent parametrizationsnodematch
statisticbigergm
function assuming there are 50 blocks (use the covariate information for the first step)
bigergm
function assuming there are 50 blocks (use the covariate information for the first step)
twitter_sim <- simulate(twitter_tmp, output = "stats", nsim = 10,
control_within = control.simulate.formula(MCMC.interval = 20000))
plot(twitter_sim$within_network$edges,
xlab = "Index", ylab = "Within-block Edges", type = "l")
plot(twitter_sim$between_network$edges,
xlab = "Index", ylab = "Within-block Edges", type = "l")
ergm.multi
: Size-dependent parametrizations!twitter_res <- bigergm(
object = model_formula,
# The number of blocks
n_MM_step_max = 1000,tol_MM_step = 0.0001,
# The maximum number of MM steps
estimate_parameters = TRUE,
# Perform parameter estimation after the block recovery step
clustering_with_features = TRUE,
# Indicate wether clustering should take into account
# nodematch on characteristics
check_block_membership = TRUE,
# How many blocks should be estimated?
n_blocks = 50,
# Which estimation for the within-block model should be used?
method_within = "MLE",
# Control arguments for within-block simulations
control_within = control.ergm(parallel=5, parallel.type="PSOCK",
MCMC.burnin = 10000,
MCMC.interval = 20000))
Call:
.main()
Found 50 clusters of relative sizes:
0.02418987 0.02008215 0.02784117 0.01780009 0.0109539 0.03651301 0.02647193 0.01643085 0.01916933 0.03833866 0.03012323 0.03194888 0.005933364 0.01688727 0.005476951 0.01186673 0.01962574 0.01141031 0.03423094 0.04792332 0.02008215 0.04198996 0.009128252 0.03103606 0.01414879 0.007302602 0.01506162 0.02555911 0.0182565 0.007302602 0.01916933 0.01186673 0.02555911 0.02510269 0.02921041 0.007302602 0.02099498 0.01551803 0.005476951 0.01141031 0.03423094 0.02555911 0.02373346 0.04518485 0.01597444 0.007302602 0.006389776 0.009128252 0.007302602 0.01049749
Results of within-cluster estimation:
Monte Carlo Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -5.43754 0.28367 0 -19.169 <1e-04 ***
nodematch.party 0.94633 0.01371 0 69.034 <1e-04 ***
nodematch.race -0.09171 0.01188 0 -7.720 <1e-04 ***
nodematch.gender 0.04617 0.01155 0 3.999 <1e-04 ***
N(1)~transitiveties 2.06586 0.24291 0 8.505 <1e-04 ***
N(log(n))~transitiveties -0.42498 0.06314 0 -6.731 <1e-04 ***
N(1)~mutual 3.42629 0.23246 0 14.739 <1e-04 ***
N(log(n))~mutual -0.29690 0.05560 0 -5.340 <1e-04 ***
N(log(n))~edges 0.62174 0.07280 0 8.540 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results of between-cluster estimation:
Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -8.03344 0.04456 0 -180.263 <1e-04 ***
nodematch.party 1.57107 0.04033 0 38.956 <1e-04 ***
nodematch.race -0.16491 0.03039 0 -5.427 <1e-04 ***
nodematch.gender 0.19144 0.03030 0 6.319 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC: 184693.6 BIC: 184784.2
est_within
is an ergm
object
Note: MCMC diagnostics shown here are from the last round of
simulation, prior to computation of final parameter estimates.
Because the final estimates are refinements of those used for this
simulation run, these diagnostics may understate model performance.
To directly assess the performance of the final model on in-model
statistics, please use the GOF command: gof(ergmFitObject,
GOF=~model).
simulated <- mapply(FUN = function(x) {
tmp_x <- get_within_networks(x, x%v%"block")
summary(tmp_x ~transitiveties + triangle + edges + mutual)
}, x = sim_networks,
SIMPLIFY = FALSE)
simulated <- do.call(rbind, simulated)
state_twitter_block <- get_within_networks(state_twitter, twitter_res$block)
observed_statistics <- summary(state_twitter_block ~ transitiveties + triangle + edges + mutual)
plot(density(simulated[,1]), xlab = "Transitivities", ylab = "Density", main = "")
abline(v = observed_statistics[1], col = "red")
# Assess fit
twitter_gof <- gof(twitter_res,
# How many simulations should be performed?
n_sim = 100,
# Ensure that all results are reproducible
seed = 2,
# Either 'full' or 'within' to say if gof should be based on only
# within-connections or all
type = "within",
# Should the geodesic distance be computed?
compute_geodesic_distance = TRUE,
# Control arguments for within-block simulations
control_within = control.simulate.formula(
MCMC.burnin = 10000,
MCMC.interval = 30000))
bigergm
is a powerful tool for estimating, simulating, and diagnosing HERGMs for big networks