BigSEM for network data
We will show how to use BigSEM to analyze network data in the SEM framework.
- SEM with networks - background
- Example datasets
- Node based analysis with network statistics
- Node based analysis with latent space model
- Edge based analysis with edge values
- Edge based analysis with latent space model
- Use of Web App for SEM with Networks
SEM with networks - background
Network data can be integrated into the SEM framework in different ways. We focus on two main approaches here. The first approach extracts the information from a network based on each participant and then use that information as variable(s) in a SEM model. In this method, each participant (node) in the network is the basic unit for analysis. The second approach extracts information from a network based on each relationship present. In this method, each pair of participants or nodes are used as the basic unit for analysis.
In our software, we propose and implement four types of models.
Network nodes as analysis units
In this method, each participant is treated as the basic unit of analysis. Therefore, the sample size is equal the sample size $n$. We use two approaches here: (1) we extract information as network statistics from a network, and (2) we extract information through a latent space model.
Use network statistics
We denote a network through a square adjacency matrix $\mathbf{M}=[m_{ij}]$ with each $m_{ij}$ denoting the connection between subject $i$ and subject $j$. Based on the adjacency matrix, many node-based network statistics can be defined. For example, the statistic degree is a centrality measure that simply counts how many subjects a subject connects to in the network. The statistic betweenness measures the extent to which a subject lies on the paths between other subjects. Subjects with high betweenness influence how the information flows in the network. Both degree and betweenness quantify the importance of a subject in a network. For example, for our friendship network, if a student has a larger degree, he or she is more popular in the network. From a network, we can derive a vector of network statistics for each subject $i$ as $\mathbf{t}_{i}(\mathbf{M})$ .
Because the network statistics are node based, the dimension of the resulting network statistics data will match the non-network data, and they can be combined to be used in SEM as any regular SEM analysis.
Use latent space model
In this approach, each subject assumes a position in a Euclidean space. The distance of two subjects in the latent space is assumed to be related to how likely they are connected in the network. The idea of latent space modeling is similar to that of factor analysis
with a latent factor space and factor scores. Let $\mathbf{z}_{i}$ be a vector of latent positions of subject $i$ in the latent space. For subjects $i$ and $j$, the Euclidean distance between them is:
\begin{equation}
d_{ij}({\bf z}_{i},{\bf z}_{j})=\sqrt{({\bf z}_{i}-{\bf z}_{j})^{t}({\bf z}_{i}-{\bf z}_{j})}=\sqrt{\sum_{d=1}^{D}({z}_{i,d}-{z}_{j,d})^{2}}
\label{eq:distance}
\end{equation}
where $(\cdot)^{t}$ is the transpose of a matrix or vector, $D$ is the dimension of the Euclidean latent space, $\mathbf{z}_{i}=(z_{i,1},z_{i,2},\cdots,z_{i,D})^{t}$ and $\mathbf{z}_{j}=(z_{j,1},z_{j,2},\cdots,z_{j,D})^{t}$ are the latent positions of subjects $i$ and $j$, respectively. With the distance, the latent space model can be written as
\begin{equation}
\begin{cases}
m_{ij} & \sim\text{Bernoulli}(p_{ij})\\
\text{logit}[p(m_{ij})] & =\alpha+\boldsymbol{\beta}'{\bf h}_{ij}-\kappa\times d_{ij}({\bf z}_{i},{\bf z}_{j})
\end{cases}\label{eq:LSM}
\end{equation}
where $\alpha$ is an intercept, ${\bf h}_{ij}$ is a vector of covariates and $\boldsymbol{\beta}$ contains the coefficients of the covariates. Note that the network is assumed to be unweighted here. In our software, following the tradition in network analysis, the coefficient $\kappa$ for $d_{ij}$ is fixed as 1 because $\kappa$ can be rescaled together with the distance (Hoff et al., 2002). Therefore, the closer of two subjects are in the latent space, the higher the probability is for them to be connected after controlling the covariates in the model.
Here, we adapt and extend the latent space model to have the form shown below:
\begin{equation}
\begin{cases}
E(m_{ij}) & =\mu_{ij}\\
g(\mu_{ij}) & =\alpha-d_{ij}({\bf z}_{i},{\bf z}_{j})
\end{cases}\label{eq:SEM-LSM}
\end{equation}
where $g$ is a link function. First, we assume the connection between two subjects is solely explained by the latent space. Second, we relax the requirement of the Bernoulli distribution to use any exponential family of distributions. Using this model, we can extract information from a network. The idea is similar to principal component analysis. In our model, the latent positions will be used along with non-network variables in the SEM framework.
Network edges as analysis units
Another approach we take is to use edges as the unit of interest. In this case, non-network data are reformatted for analysis to be based on pairs of individuals. In this case, given a non-network covariate $c$, we define $c_{ij} = f(c_i, c_j)$, where $c_i$ and $c_j$ are the covariate values for individual $i$ and individual $j$. The function $f$ can be chosen according to the purpose of the analysis. For example, $c_{ij}$ can be the average of $c_i$ and $c_j$, or it can be the difference. Then, these pairwise non-network variables can be used as either endogenous or exogenous variables.
Use network statistics
Similar as in the node-based framework, in the edge-based framework, network statistics that can be obtained free from assuming underlying models to the social network can be used in SEM. The network statistics are constructed based on each pairs of subjects. For example, the shortest path length between each pair of nodes can be used as the edge-based network statistics.
Use latent space model
The latent space modeling approach can also be used when using a pair of subjects as the unit of analysis. In this case, the latent distance between two subjects $d_{ij}(z_i, z_j)$ can be used in SEM instead of the latent positions $z_i$ and $z_j$.
Example datasets
We will use several datasets to illustrate the use of our software.
Friendship Network Data
In this dataset, information on friendship network, alcohol use, smoking, the big five personality traits, and academic performance among college students is collected for three years in 2017, 2018, and 2019. The participants were undergraduate students and the sample size is $N = 165$. There were about an equal number of male and female students (45% vs. 55%) in the sample. The average age of the students was 21.64 ($SD$ = 0.85). The average GPA of the students was about 3.273 ($SD$ = 0.53) out of 5.
Information on two social networks was collected. First, each student was presented a list of all the students in the study and was asked to report his/her acquaintanceship with everyone else on the list, on a Likert scale of 0 to 4. Second, each student was asked to report whether the students on the list were their WeChat friends or not (WeChat is a popular social network platform in China). Therefore, there are two friendship networks: the first one is a real-life weighted acquaintanceship network (referred to as the acquaintance network) and the second one is a virtual unweighted social media network (referred to as the WeChat network). The two networks together can be viewed as a multiplex network. Data on personality, happiness, depression, and loneliness were also collected.
Attorney Network Data
The second dataset includes the cowork and advice network dataset from 71 attorneys from a law firm called SG&R in 1988. The dataset is available from the SIENA website. The first wave of network data will be used in the analysis in the current tutorial. The cowork information is collected by asking the company employees to select people who have worked on the same case with them. Additionally, information on an advice network is collected via asking respondents who they seek advice from at work. Several non-network attributes are collected alongside with the networks. From those, the office one works at (i.e., Boston, Hartford, and Providence) and years with the firm will be used for analysis.
Florentine Marriage Data
The dataset is from Breiger and Pattison (1986), where the social network indicates marriage alliances, and the non-network variables include (1) wealth, each family’s net wealth in 1427 (in thousands of lira); (2) priorates, the number of priorates (seats on the civic council) held between 1282- 1344; and (3) totalties, the total number of business or marriage ties in the total dataset of 116 families.
Node based analysis with network statistics
The function sem.net
can be used to fit a SEM model with network data using node statistics as variables. User-specified network statistics will be calculated and used as variables instead of the networks themselves in the SEM.
The following choices of network statistics can be used:
degree
: Degree is a centrality measure that counts actors/nodes a specific node is connected to.betweenness
: Betweenness is a centrality measure that counts how many shortest path an actor is crossed by through a random choice. It measures how much an individual control the spread of information.closeness
: Closeness is a measure of how efficiently a node spreads information and can be calculated by the average inverse distance from a node to all other nodes.evcent
: The eigenvector centrality is a measure of transitive influence of each node, meaning that a node with high eigenvector centrality tends to connect with other nodes with high eigenvector centrality (Ruhnau, 2000).stresscent
: Stress centrality is similar to betweenness centrality as it also measures the control of spread. However, while betweenness centrality measures through a random fraction of shortest paths, stress centrality takes into account all shortest paths (Szczepanski et al., 2012).infocent
: Information centrality is defined as the reduction in network efficiency if a target node is removed. It is a measure of node effectiveness in spreading information (Latora and Marchiori, 2007).ivi
: Integrated value of influence is a measure that combines different centrality measures (Salavaty et al., 2020a)hubeness.score
: Hubeness score is a component of IVI and measures a node’s influence in its surrounding environment.spreading.score
: Spreading score is another component of IVI and measures a node’s spreading potential.clusterRank
: Cluster rank is a measure of clustering that takes into account a node, its neighbors, and their clustering coefficients.
Simulated Data Example
To begin with, a random simulated dataset can be used to demonstrate the usage of the node-based network statistics approach. The code below generate a simulated network net
with four non-network covariates x1 - x4
which loads on two latent variables lv1, lv2
.
set.seed(100)
nsamp = 100 # sample size
net <- ifelse(matrix(rnorm(nsamp^2), nsamp, nsamp) > 1, 1, 0) # simulate network
mean(net) # density of simulated network
# simulate non-network variables
lv1 <- rnorm(nsamp)
lv2 <- rnorm(nsamp)
nonnet <- data.frame(x1 = lv1*0.5 + rnorm(nsamp),
x2 = lv1*0.8 + rnorm(nsamp),
x3 = lv2*0.5 + rnorm(nsamp),
x4 = lv2*0.8 + rnorm(nsamp))
With the simulated data, we can define a model
string with lavaan syntax that specifies the measurement model as well as the relationship between the network and the non-network variables. In this case, we are using net
as a mediator between the two latent variables. Since data are generated randomly, the effects should be small overall.
model <-'
lv1 =~ x1 + x2
lv2 =~ x3 + x4
net ~ lv2
lv1 ~ net + lv2
'
Arguments passed to the sem.net
function includes the model, the dataset, and the network statistics of interest. Note that data
here should be a list with two elements, one being the named list of all network variables and one being the dataframe containing non-network variables. A summary
function can be used to look at the output, and the function path.networksem
can be used to look at mediation effects.
data = list(network = list(net = net), nonnetwork = nonnet)
set.seed(100)
res <- sem.net(model = model, data = data, netstats = c('degree'))
summary(res)
path.networksem(res, "lv2", c("net.degree"), "lv1")
The output of should look like the following.
> summary(res)
The SEM output:
lavaan 0.6.15 ended normally after 54 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 12
Number of observations 100
Model Test User Model:
Test statistic 1.230
Degrees of freedom 3
P-value (Chi-square) 0.746
Model Test Baseline Model:
Test statistic 24.987
Degrees of freedom 10
P-value 0.005
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.394
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -913.294
Loglikelihood unrestricted model (H1) -912.679
Akaike (AIC) 1850.588
Bayesian (BIC) 1881.850
Sample-size adjusted Bayesian (SABIC) 1843.951
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.118
P-value H_0: RMSEA <= 0.050 0.810
P-value H_0: RMSEA >= 0.080 0.120
Standardized Root Mean Square Residual:
SRMR 0.026
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
lv2 =~
x4 1.000
x3 2.035 2.162 0.941 0.347
lv1 =~
x2 1.000
x1 1.056 0.789 1.338 0.181
Regressions:
Estimate Std.Err z-value P(>|z|)
lv1 ~
lv2 -0.441 0.300 -1.470 0.142
net.degree ~
lv2 -0.934 1.163 -0.804 0.422
lv1 ~
net.degree -0.011 0.020 -0.569 0.569
Variances:
Estimate Std.Err z-value P(>|z|)
.x4 1.350 0.293 4.603 0.000
.x3 0.215 0.923 0.233 0.816
.x2 1.002 0.299 3.357 0.001
.x1 1.047 0.328 3.190 0.001
.net.degree 22.292 3.164 7.046 0.000
lv2 0.214 0.249 0.860 0.390
.lv1 0.302 0.264 1.142 0.253
> path.networksem(res, "lv2", c("net.degree"), "lv1")
predictor mediator outcome apath bpath indirect indirect_se indirect_z
1 lv2 net.degree lv1 -0.934393 -0.01126621 0.01052707 1.086552 0.009688509
Empirical Data Example
Using the friendship network data, a model with 5 personality traits and two networks' effect on happiness can be fitted using the code below. In this case, degree, betweenness, closeness are used as network statistics.
# load data
load("data/cf_data_book.RData") ## load the list cf_data
## data - non-network variables
non_network <- as.data.frame(cf_data$cf_nodal_cov)
dim(non_network)
## network - network variables (friends network and wechat network)
## note that the names of the networks are used in model specification
network <- list()
network$friends <- cf_data$cf_friend_network
network$wechat <- cf_data$cf_wetchat_network
model <-'
Extroversion =~ personality1 + personality6
+ personality11 + personality16
Conscientiousness =~ personality2 + personality7
+ personality12 + personality17
Neuroticism =~ personality3 + personality8
+ personality13 + personality18
Openness =~ personality4 + personality9
+ personality14 + personality19
Agreeableness =~ personality5 + personality10 +
personality15 + personality20
Happiness =~ happy1 + happy2 + happy3 + happy4
friends ~ Extroversion + Conscientiousness + Neuroticism +
Openness + Agreeableness
Happiness ~ friends + wechat
'
## run sem.net
data = list(
nonnetwork = non_network,
network = network
)
set.seed(100)
res <- sem.net(model=model, data=data,
netstats=c("degree", "betweenness", "closeness"),
netstats.rescale = T,
netstats.options=list("degree"=list("cmode"="freeman")))
## results
summary(res)
The output of the analysis is given below:
lavaan 0.6-18 ended normally after 453 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 82
Number of observations 165
Model Test User Model:
Test statistic 844.769
Degrees of freedom 377
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 1795.826
Degrees of freedom 432
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.657
Tucker-Lewis Index (TLI) 0.607
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -6286.542
Loglikelihood unrestricted model (H1) -5864.157
Akaike (AIC) 12737.084
Bayesian (BIC) 12991.771
Sample-size adjusted Bayesian (SABIC) 12732.159
Root Mean Square Error of Approximation:
RMSEA 0.087
90 Percent confidence interval - lower 0.079
90 Percent confidence interval - upper 0.095
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 0.922
Standardized Root Mean Square Residual:
SRMR 0.116
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
Happiness =~
happy4 1.000
happy3 -4.283 3.684 -1.162 0.245
happy2 -6.682 5.698 -1.173 0.241
happy1 -6.955 5.932 -1.172 0.241
Agreeableness =~
personality20 1.000
personality15 -1.200 0.905 -1.326 0.185
personality10 -4.293 2.506 -1.713 0.087
personality5 -4.462 2.606 -1.712 0.087
Openness =~
personality19 1.000
personality14 0.784 0.165 4.748 0.000
personality9 -0.224 0.106 -2.110 0.035
personality4 -0.097 0.108 -0.898 0.369
Neuroticism =~
personality18 1.000
personality13 -0.532 0.148 -3.603 0.000
personality8 -0.808 0.176 -4.602 0.000
personality3 -0.378 0.136 -2.778 0.005
Conscientiousness =~
personality17 1.000
personality12 -0.693 0.214 -3.235 0.001
personality7 -0.508 0.219 -2.319 0.020
personality2 1.108 0.265 4.187 0.000
Extroversion =~
personality16 1.000
personality11 0.609 0.136 4.493 0.000
personality6 -0.508 0.123 -4.116 0.000
personality1 -0.521 0.119 -4.377 0.000
Regressions:
Estimate Std.Err z-value P(>|z|)
friends.degree ~
Extroversion 2.355 1.126 2.091 0.037
friends.betweenness ~
Extroversion 2.119 1.048 2.023 0.043
friends.closeness ~
Extroversion 2.175 1.026 2.119 0.034
friends.degree ~
Conscientisnss -8.447 5.060 -1.670 0.095
friends.betweenness ~
Conscientisnss -7.827 4.706 -1.663 0.096
friends.closeness ~
Conscientisnss -7.720 4.609 -1.675 0.094
friends.degree ~
Neuroticism -1.282 1.364 -0.940 0.347
friends.betweenness ~
Neuroticism -1.252 1.272 -0.985 0.325
friends.closeness ~
Neuroticism -1.324 1.248 -1.061 0.289
friends.degree ~
Openness -1.355 1.483 -0.914 0.361
friends.betweenness ~
Openness -1.204 1.377 -0.875 0.382
friends.closeness ~
Openness -1.162 1.348 -0.862 0.389
friends.degree ~
Agreeableness -16.541 15.253 -1.084 0.278
friends.betweenness ~
Agreeableness -15.697 14.299 -1.098 0.272
friends.closeness ~
Agreeableness -14.400 13.668 -1.054 0.292
Happiness ~
friends.degree -0.047 0.051 -0.931 0.352
frinds.btwnnss 0.007 0.025 0.292 0.771
friends.clsnss 0.062 0.059 1.045 0.296
wechat.degree 0.013 0.037 0.351 0.725
wechat.btwnnss 0.050 0.049 1.027 0.305
wechat.closnss -0.064 0.060 -1.063 0.288
Covariances:
Estimate Std.Err z-value P(>|z|)
Agreeableness ~~
Openness 0.015 0.018 0.866 0.386
Neuroticism 0.043 0.029 1.479 0.139
Conscientisnss -0.072 0.044 -1.643 0.100
Extroversion -0.011 0.020 -0.554 0.579
Openness ~~
Neuroticism 0.330 0.074 4.446 0.000
Conscientisnss -0.166 0.059 -2.806 0.005
Extroversion 0.089 0.080 1.111 0.266
Neuroticism ~~
Conscientisnss -0.153 0.058 -2.648 0.008
Extroversion 0.212 0.082 2.588 0.010
Conscientiousness ~~
Extroversion 0.174 0.070 2.490 0.013
Variances:
Estimate Std.Err z-value P(>|z|)
.happy4 2.702 0.298 9.066 0.000
.happy3 1.226 0.147 8.353 0.000
.happy2 0.577 0.139 4.146 0.000
.happy1 0.507 0.145 3.496 0.000
.personality20 1.107 0.123 8.979 0.000
.personality15 1.195 0.134 8.945 0.000
.personality10 0.617 0.115 5.359 0.000
.personality5 0.742 0.130 5.705 0.000
.personality19 0.244 0.125 1.948 0.051
.personality14 0.680 0.107 6.372 0.000
.personality9 0.854 0.095 8.982 0.000
.personality4 0.963 0.106 9.067 0.000
.personality18 0.498 0.104 4.790 0.000
.personality13 0.920 0.109 8.469 0.000
.personality8 0.965 0.125 7.694 0.000
.personality3 0.893 0.102 8.768 0.000
.personality17 0.707 0.088 8.051 0.000
.personality12 1.042 0.119 8.753 0.000
.personality7 1.286 0.144 8.940 0.000
.personality2 1.193 0.143 8.337 0.000
.personality16 0.595 0.152 3.917 0.000
.personality11 1.125 0.140 8.023 0.000
.personality6 1.043 0.126 8.305 0.000
.personality1 0.902 0.111 8.122 0.000
.friends.degree 0.074 0.026 2.872 0.004
.frinds.btwnnss 0.236 0.034 6.912 0.000
.friends.clsnss 0.170 0.029 5.849 0.000
.Happiness 0.024 0.040 0.587 0.557
Agreeableness 0.030 0.034 0.874 0.382
Openness 0.652 0.155 4.209 0.000
Neuroticism 0.495 0.129 3.822 0.000
Conscientisnss 0.248 0.082 3.038 0.002
Extroversion 0.843 0.199 4.240 0.000
The multiple mediation from Agreeableness to friendship network to Happiness can be calculated using the following code.
> path.networksem(res, 'Agreeableness',
c('friends.degree', 'friends.betweenness', 'friends.closeness'),
'Happiness')
predictor mediator outcome apath bpath indirect
1 Agreeableness friends.degree Happiness -16.54130 -0.047133471 0.7796491
2 Agreeableness friends.betweenness Happiness -15.69767 0.007403778 -0.1162220
3 Agreeableness friends.closeness Happiness -14.40081 0.061957757 -0.8922416
indirect_se indirect_z
1 252.3110 0.0030900323
2 224.4727 -0.0005177557
3 196.8378 -0.0045328765
The model used here is shown in the diagram below. The model has the following features:
- We use two networks - friendship and WeChat networks.
- Three network statistics are used - degree, closeness, and betweenness.
- Friendship network is used as mediators.
Node based analysis with latent space model
The node-based latent space model approach calculates latent positions of the networks, and use them in the SEM analysis along with non-network variables.
Simulated Data Example
To begin with, a random simulated dataset can be used to demonstrate the usage of the node-based network statistics approach. The code below generate a simulated network net
with four non-network covariates x1 - x4
which loads on two latent variables lv1, lv2
.
set.seed(10)
nsamp = 50
net <- ifelse(matrix(rnorm(nsamp^2), nsamp, nsamp) > 1, 1, 0)
mean(net) # density of simulated network
lv1 <- rnorm(nsamp)
lv2 <- rnorm(nsamp)
nonnet <- data.frame(x1 = lv1*0.5 + rnorm(nsamp),
x2 = lv1*0.8 + rnorm(nsamp),
x3 = lv2*0.5 + rnorm(nsamp),
x4 = lv2*0.8 + rnorm(nsamp))
With the simulated data, we can define a model
string with lavaan syntax that specifies the measurement model as well as the relationship between the network and the non-network variables. In this case, we are using net
as a mediator between the two latent variables. Since data are generated randomly, the effects should be small overall.
model <-'
lv1 =~ x1 + x2
lv2 =~ x3 + x4
net ~ lv2
lv1 ~ net + lv2
'
Arguments passed to the sem.net.lsm
function includes the model, the dataset, and the number of latent dimensions. Note that data
here should be a list with two elements, one being the named list of all network variables and one being the dataframe containing non-network variables. A summary
function can be used to look at the output, and the function path.networksem
can be used to look at mediation effects across the two latent dimensions.
data = list(network = list(net = net), nonnetwork = nonnet)
set.seed(100)
res <- sem.net.lsm(model = model, data = data, latent.dim = 2)
summary(res)
path.networksem(res, 'lv2', c('net.Z1', 'net.Z2'), 'lv1')
The output looks like the following.
> summary(res)
Model Fit InformationSEM Test statistics: 3.771276 on 6 df with p-value: 0.7075962
NOTE: It is not certain whether it is appropriate to use latentnet's BIC to select latent space dimension, whether or not to include actor-specific random effects, and to compare clustered models with the unclustered model.
network 1 LSM BIC: 2242.696
========================================
========================================
The SEM output:
lavaan 0.6.15 ended normally after 117 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 15
Number of observations 50
Model Test User Model:
Test statistic 3.771
Degrees of freedom 6
P-value (Chi-square) 0.708
Model Test Baseline Model:
Test statistic 34.438
Degrees of freedom 15
P-value 0.003
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.287
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -434.447
Loglikelihood unrestricted model (H1) -432.561
Akaike (AIC) 898.893
Bayesian (BIC) 927.574
Sample-size adjusted Bayesian (SABIC) 880.491
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.138
P-value H_0: RMSEA <= 0.050 0.765
P-value H_0: RMSEA >= 0.080 0.165
Standardized Root Mean Square Residual:
SRMR 0.062
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
lv2 =~
x4 1.000
x3 4.622 6.418 0.720 0.471
lv1 =~
x2 1.000
x1 -0.088 0.271 -0.326 0.744
Regressions:
Estimate Std.Err z-value P(>|z|)
lv1 ~
lv2 -0.984 0.432 -2.279 0.023
net.Z1 ~
lv2 -0.159 0.207 -0.765 0.444
net.Z2 ~
lv2 0.208 0.257 0.809 0.418
lv1 ~
net.Z1 -0.215 0.169 -1.277 0.202
net.Z2 0.255 0.138 1.850 0.064
Variances:
Estimate Std.Err z-value P(>|z|)
.x4 1.947 0.425 4.581 0.000
.x3 -1.587 3.655 -0.434 0.664
.x2 2.927 6.822 0.429 0.668
.x1 1.345 0.274 4.906 0.000
.net.Z1 0.624 0.124 5.012 0.000
.net.Z2 0.950 0.189 5.013 0.000
lv2 0.139 0.227 0.612 0.541
.lv1 -1.984 6.796 -0.292 0.770
The LSM output:
==========================
Summary of model fit
==========================
Formula: network::network(data$network[[latent.network[i]]]) ~ euclidean(d = latent.dim)
<environment: 0x7fc43202a550>
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) -0.18777 -0.42332 0.05 0.1175
Overall BIC: 2242.696
Likelihood BIC: 2107.714
Latent space/clustering BIC: 134.9814
Covariate coefficients MKL:
Estimate
(Intercept) -0.8639125
> path.networksem(res, 'lv2', c('net.Z1', 'net.Z2'), 'lv1')
predictor mediator outcome apath bpath indirect
1 lv2 net.Z1 lv1 -0.1587188 -0.2154100 0.03418961
2 lv2 net.Z2 lv1 0.2081154 0.2547222 0.05301162
indirect_se indirect_z
1 0.04030792 0.8482108
2 0.05368411 0.9874733
Empirical Data Example
We fit the same model on the friendship and WeChat networks from the network statistics approach using the LSM approach. Under this approach, the latent positions take the roles of the network statistics but the model string can stay the same.
model <-'
Extroversion =~ personality1 + personality6
+ personality11 + personality16
Conscientiousness =~ personality2 + personality7
+ personality12 + personality17
Neuroticism =~ personality3 + personality8
+ personality13 + personality18
Openness =~ personality4 + personality9
+ personality14 + personality19
Agreeableness =~ personality5 + personality10 +
personality15 + personality20
Happiness =~ happy1 + happy2 + happy3 + happy4
friends ~ Extroversion + Conscientiousness + Neuroticism +
Openness + Agreeableness
Happiness ~ friends + wechat
'
To fit the model, the sem.net.lsm()
function is used. The argument latent.dim
should be used to denote the number of latent dimensions to be used in estimating the LSM. A random seed can be set to ensure reproduction of the results, and the argument data.scale = T
is used so the scale of the latent positions and the non-network variables are not too different.
data = list(network=network, nonnetwork=non_network)
set.seed(100)
res <- sem.net.lsm(model=model,data=data, latent.dim = 2, data.rescale = T)
For SEM with latent positions, the estimation is again a two-stage process. First, a latent space model with no covariates is used to estimate latent positions through the latentnet
R package. The resulting latent positions are then be extracted and compiled into the same dataset as the non-network variables such as the Big Five personality items and the happiness score items, which are then inputted into lavaan
to be estimated in the SEM framework. We could again use res$data to access the restructured data with latent positions, and res$model to access the modified model string. The output of sem.net.lsm() has two components in res$estimates - res$estimates$sem.es for lavaan SEM results and res$estimates$lsm.es for latentnet LSM results.
The output of the analysis is given below:
> summary(res)
Model Fit InformationSEM Test statistics: 947.953 on 329 df with p-value: 0
network 1 LSM BIC: 15760.02
network 2 LSM BIC: 15517.77
========================================
========================================
The SEM output:
lavaan 0.6.15 ended normally after 147 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 74
Number of observations 165
Model Test User Model:
Test statistic 947.953
Degrees of freedom 329
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 1448.277
Degrees of freedom 377
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.422
Tucker-Lewis Index (TLI) 0.338
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -5824.045
Loglikelihood unrestricted model (H1) -5350.068
Akaike (AIC) 11796.089
Bayesian (BIC) 12025.929
Sample-size adjusted Bayesian (SABIC) 11791.645
Root Mean Square Error of Approximation:
RMSEA 0.107
90 Percent confidence interval - lower 0.099
90 Percent confidence interval - upper 0.115
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.119
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
Happiness =~
happy4 1.000
happy3 -5.462 4.485 -1.218 0.223
happy2 -8.435 6.866 -1.229 0.219
happy1 -8.634 7.029 -1.228 0.219
Agreeableness =~
personality20 1.000
personality15 -0.915 0.722 -1.267 0.205
personality10 -4.359 2.395 -1.820 0.069
personality5 -3.726 2.043 -1.824 0.068
Openness =~
personality19 1.000
personality14 0.658 0.144 4.571 0.000
personality9 -0.201 0.100 -2.004 0.045
personality4 -0.085 0.097 -0.873 0.383
Neuroticism =~
personality18 1.000
personality13 -0.492 0.139 -3.529 0.000
personality8 -0.701 0.151 -4.651 0.000
personality3 -0.359 0.135 -2.664 0.008
Conscientiousness =~
personality17 1.000
personality12 -0.475 0.163 -2.911 0.004
personality7 -0.383 0.159 -2.412 0.016
personality2 0.843 0.193 4.378 0.000
Extroversion =~
personality16 1.000
personality11 0.632 0.151 4.181 0.000
personality6 -0.597 0.148 -4.038 0.000
personality1 -0.629 0.151 -4.170 0.000
Regressions:
Estimate Std.Err z-value P(>|z|)
friends.Z1 ~
Extroversion -0.150 0.179 -0.838 0.402
friends.Z2 ~
Extroversion -0.238 0.199 -1.192 0.233
friends.Z1 ~
Conscientisnss -0.047 0.327 -0.144 0.885
friends.Z2 ~
Conscientisnss 0.166 0.347 0.480 0.631
friends.Z1 ~
Neuroticism -0.001 0.234 -0.006 0.995
friends.Z2 ~
Neuroticism 0.600 0.303 1.982 0.048
friends.Z1 ~
Openness 0.109 0.144 0.756 0.450
friends.Z2 ~
Openness -0.321 0.179 -1.794 0.073
friends.Z1 ~
Agreeableness 0.335 1.023 0.328 0.743
friends.Z2 ~
Agreeableness -0.957 1.176 -0.814 0.416
Happiness ~
friends.Z1 -0.029 0.025 -1.165 0.244
friends.Z2 -0.003 0.009 -0.394 0.693
wechat.Z1 0.027 0.024 1.146 0.252
wechat.Z2 -0.002 0.009 -0.192 0.848
Covariances:
Estimate Std.Err z-value P(>|z|)
Agreeableness ~~
Openness 0.018 0.019 0.965 0.334
Neuroticism 0.041 0.027 1.538 0.124
Conscientisnss -0.072 0.041 -1.727 0.084
Extroversion -0.009 0.015 -0.553 0.580
Openness ~~
Neuroticism 0.365 0.079 4.596 0.000
Conscientisnss -0.152 0.068 -2.233 0.026
Extroversion 0.074 0.070 1.063 0.288
Neuroticism ~~
Conscientisnss -0.153 0.064 -2.391 0.017
Extroversion 0.177 0.068 2.605 0.009
Conscientiousness ~~
Extroversion 0.130 0.063 2.073 0.038
Variances:
Estimate Std.Err z-value P(>|z|)
.happy4 0.985 0.109 9.065 0.000
.happy3 0.716 0.086 8.332 0.000
.happy2 0.332 0.080 4.141 0.000
.happy1 0.300 0.082 3.678 0.000
.personality20 0.965 0.108 8.968 0.000
.personality15 0.969 0.108 8.987 0.000
.personality10 0.436 0.116 3.773 0.000
.personality5 0.586 0.101 5.806 0.000
.personality19 0.205 0.154 1.326 0.185
.personality14 0.652 0.098 6.662 0.000
.personality9 0.962 0.107 9.013 0.000
.personality4 0.988 0.109 9.072 0.000
.personality18 0.485 0.105 4.635 0.000
.personality13 0.871 0.102 8.529 0.000
.personality8 0.744 0.096 7.720 0.000
.personality3 0.928 0.105 8.809 0.000
.personality17 0.591 0.106 5.555 0.000
.personality12 0.903 0.105 8.600 0.000
.personality7 0.935 0.106 8.781 0.000
.personality2 0.708 0.100 7.046 0.000
.personality16 0.443 0.116 3.831 0.000
.personality11 0.774 0.099 7.796 0.000
.personality6 0.797 0.100 7.983 0.000
.personality1 0.776 0.099 7.813 0.000
.friends.Z1 0.963 0.107 8.984 0.000
.friends.Z2 0.881 0.118 7.497 0.000
.Happiness 0.009 0.015 0.615 0.539
Agreeableness 0.029 0.031 0.934 0.350
Openness 0.789 0.186 4.234 0.000
Neuroticism 0.509 0.131 3.880 0.000
Conscientisnss 0.403 0.122 3.310 0.001
Extroversion 0.551 0.143 3.842 0.000
The LSM output:
==========================
Summary of model fit
==========================
Formula: network::network(data$network[[latent.network[i]]]) ~ euclidean(d = latent.dim)
<environment: 0x7fc412d34470>
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) 2.6130 2.5054 2.7225 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Overall BIC: 15760.02
Likelihood BIC: 14056.24
Latent space/clustering BIC: 1703.784
Covariate coefficients MKL:
Estimate
(Intercept) 2.426421
==========================
Summary of model fit
==========================
Formula: network::network(data$network[[latent.network[i]]]) ~ euclidean(d = latent.dim)
<environment: 0x7fc412d34470>
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) 1.1886 1.0938 1.2828 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Overall BIC: 15517.77
Likelihood BIC: 13970.87
Latent space/clustering BIC: 1546.901
Covariate coefficients MKL:
Estimate
(Intercept) 0.967353
The indirect effect from Agreeableness to the latent network positions then to Happiness is given below.
> path.networksem(res,
'Agreeableness',
c('friends.Z1', 'friends.Z2'),
'Happiness')
predictor mediator outcome apath bpath
1 Agreeableness friends.Z1 Happiness 0.3354827 -0.028993008
2 Agreeableness friends.Z2 Happiness -0.9573035 -0.003419798
indirect indirect_se indirect_z
1 -0.009726651 0.343095 -0.028349729
2 0.003273785 1.125696 0.002908231
The path diagram is shown as the following.
Edge based analysis with edge values
The edge based analysis can be conducted using the function sem.net.edge
. The idea behind this method is that the edge values can be the unit of analysis if we transform non-network covariates into pair-based values.
Simulated Data Example
To begin with, a random simulated dataset can be used to demonstrate the usage of the node-based network statistics approach. The code below generate a simulated network net
with four non-network covariates x1 - x4
which loads on two latent variables lv1, lv2
.
set.seed(100)
nsamp = 100
net <- data.frame(ifelse(matrix(rnorm(nsamp^2), nsamp, nsamp) > 1, 1, 0))
mean(net) # density of simulated network
lv1 <- rnorm(nsamp)
lv2 <- rnorm(nsamp)
nonnet <- data.frame(x1 = lv1*0.5 + rnorm(nsamp),
x2 = lv1*0.8 + rnorm(nsamp),
x3 = lv2*0.5 + rnorm(nsamp),
x4 = lv2*0.8 + rnorm(nsamp))
With the simulated data, we can define a model
string with lavaan syntax that specifies the measurement model as well as the relationship between the network and the non-network variables. In this case, we are using net
as a mediator between the two latent variables. Since data are generated randomly, the effects should be small overall.
model <-'
lv1 =~ x1 + x2
lv2 =~ x3 + x4
lv1 ~ net
lv2 ~ lv1
'
Arguments passed to the sem.net.edge
function includes the model and the dataset. Note that data
here should be a list with two elements, one being the named list of all network variables and one being the dataframe containing non-network variables. A summary
function can be used to look at the output, and the function path.networksem
can be used to look at mediation effects.
data = list(network = list(net = net), nonnetwork = nonnet)
set.seed(100)
res <- sem.net.edge(model = model, data = data, type = 'difference')
summary(res)
path.networksem(res, "net", "lv1", "lv2")
The output is shown below.
> summary(res)
The SEM output:
lavaan 0.6.15 ended normally after 58 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 10
Number of observations 10000
Model Test User Model:
Test statistic 1.584
Degrees of freedom 4
P-value (Chi-square) 0.812
Model Test Baseline Model:
Test statistic 2296.506
Degrees of freedom 10
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.003
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -75480.300
Loglikelihood unrestricted model (H1) -75479.508
Akaike (AIC) 150980.601
Bayesian (BIC) 151052.704
Sample-size adjusted Bayesian (SABIC) 151020.925
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.009
P-value H_0: RMSEA <= 0.050 1.000
P-value H_0: RMSEA >= 0.080 0.000
Standardized Root Mean Square Residual:
SRMR 0.003
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
lv1 =~
x1 1.000
x2 0.810 0.063 12.894 0.000
lv2 =~
x3 1.000
x4 0.302 0.056 5.377 0.000
Regressions:
Estimate Std.Err z-value P(>|z|)
lv1 ~
net 0.053 0.039 1.371 0.170
lv2 ~
lv1 -0.482 0.035 -13.683 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 1.964 0.076 25.814 0.000
.x2 2.104 0.055 38.145 0.000
.x3 -0.681 0.527 -1.293 0.196
.x4 2.865 0.063 45.557 0.000
.lv1 0.898 0.077 11.708 0.000
.lv2 2.678 0.529 5.061 0.000
> path.networksem(res, "net", "lv1", "lv2")
predictor mediator outcome apath bpath indirect
1 net lv1 lv2 0.05287153 -0.4823857 -0.02550447
indirect_se indirect_z
1 0.01705778 -1.495181
Empirical Data Example
As an empirical example, we analyze the the attorney cowork and advice networks. In this example, the advice network is predicted by gender and years in practice, and the cowork network is predicted by the advice network, gender, and years in practice all together. In this case, the advice network acts as a mediator, while gender and years in practice exert indirect effect onto the cowork network through the advice network in addition to having direct effects. The model specification is given below.
non_network <- read.table("data/attorney/ELattr.dat")[,c(3,5)]
colnames(non_network) <- c('gender', 'years')
non_network$gender <- non_network$gender - 1
network <- list()
network$advice <- read.table("data/attorney/ELadv.dat")
network$cowork <- read.table("data/attorney/ELwork.dat")
model <-'
advice ~ gender + years
cowork ~ advice + gender + years
'
To use the function sem.net.edge(), we need to specify whether the covariate values to be run with the social network edge values in SEM should be calculated as the ”difference
” across two individuals or the ”average
” across two individuals. Here, the argument ordered = c("cowork", "advice")
is used to tell lavaan that the outcome variables cowork and advice are binary.
set.seed(100)
res <- sem.net.edge(model = model, data = data,
network = network, type = "difference", ordered = c("cowork", "advice"))
The output is shown as below.
lavaan 0.6.15 ended normally after 19 iterations
Estimator DWLS
Optimization method NLMINB
Number of model parameters 7
Number of observations 5041
Model Test User Model:
Standard Scaled
Test Statistic 0.000 0.000
Degrees of freedom 0 0
Model Test Baseline Model:
Test statistic 1343.292 1343.292
Degrees of freedom 1 1
P-value 0.000 0.000
Scaling correction factor 1.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000 1.000
Tucker-Lewis Index (TLI) 1.000 1.000
Robust Comparative Fit Index (CFI) NA
Robust Tucker-Lewis Index (TLI) NA
Root Mean Square Error of Approximation:
RMSEA 0.000 0.000
90 Percent confidence interval - lower 0.000 0.000
90 Percent confidence interval - upper 0.000 0.000
P-value H_0: RMSEA <= 0.050 NA NA
P-value H_0: RMSEA >= 0.080 NA NA
Robust RMSEA NA
90 Percent confidence interval - lower NA
90 Percent confidence interval - upper NA
P-value H_0: Robust RMSEA <= 0.050 NA
P-value H_0: Robust RMSEA >= 0.080 NA
Standardized Root Mean Square Residual:
SRMR 0.000 0.000
Parameter Estimates:
Standard errors Robust.sem
Information Expected
Information saturated (h1) model Unstructured
Regressions:
Estimate Std.Err z-value P(>|z|)
advice ~
gender -0.019 0.040 -0.463 0.643
years -0.018 0.002 -9.354 0.000
cowork ~
advice 0.691 0.019 36.651 0.000
gender 0.013 0.040 0.323 0.747
years 0.013 0.002 7.248 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
.advice 0.000
.cowork 0.000
Thresholds:
Estimate Std.Err z-value P(>|z|)
advice|t1 0.956 0.022 43.812 0.000
cowork|t1 1.037 0.022 48.049 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.advice 1.000
.cowork 0.523
Scales y*:
Estimate Std.Err z-value P(>|z|)
advice 1.000
cowork 1.000
The indirect effects can be calculated as below.
> path.networksem(res, "gender", "advice", "cowork")
predictor mediator outcome apath bpath indirect
1 gender advice cowork -0.01856161 0.6909742 -0.01282559
indirect_se indirect_z
1 0.01304666 -0.9830558
The model is shown in the graph below.
Edge based analysis with latent space model
The R function sem.net.edge.lsm
can be used to conduct edge based analysis with latent space model. In this case, the latent distance between each pair of individuals is used along with the transformed non-network covariates in SEM.
Simulated Data Example
To begin with, a random simulated dataset can be used to demonstrate the usage of the node-based network statistics approach. The code below generate a simulated network net
with four non-network covariates x1 - x4
which loads on two latent variables lv1, lv2
.
set.seed(10)
nsamp = 50
lv1 <- rnorm(nsamp)
net <- ifelse(matrix(rnorm(nsamp^2) , nsamp, nsamp) > 1, 1, 0)
lv2 <- rnorm(nsamp)
nonnet <- data.frame(x1 = lv1*0.5 + rnorm(nsamp),
x2 = lv1*0.8 + rnorm(nsamp),
x3 = lv2*0.5 + rnorm(nsamp),
x4 = lv2*0.8 + rnorm(nsamp))
With the simulated data, we can define a model
string with lavaan syntax that specifies the measurement model as well as the relationship between the network and the non-network variables. In this case, we are using net
as a mediator between the two latent variables. Since data are generated randomly, the effects should be small overall.
model <-'
lv1 =~ x1 + x2
lv2 =~ x3 + x4
net ~ lv1
lv2 ~ net
'
Arguments passed to the sem.net.edge.lsm
function includes the model, the dataset, and the latent dimensions. Note that data
here should be a list with two elements, one being the named list of all network variables and one being the dataframe containing non-network variables. A summary
function can be used to look at the output.
data = list(network = list(net = net), nonnetwork = nonnet)
set.seed(100)
res <- sem.net.edge.lsm(model = model, data = data, latent.dim = 1)
summary(res)
path.networksem(res, 'lv2', c('net.dists'), 'lv1')
The output is shown below:
Model Fit InformationSEM Test statistics: 492.628 on 4 df with p-value: 0
network 1 LSM BIC: 2244.546
========================================
========================================
The SEM output:
lavaan 0.6.15 ended normally after 29 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 11
Number of observations 2500
Model Test User Model:
Test statistic 492.628
Degrees of freedom 4
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 958.550
Degrees of freedom 10
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.485
Tucker-Lewis Index (TLI) -0.288
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -22209.465
Loglikelihood unrestricted model (H1) NA
Akaike (AIC) 44440.930
Bayesian (BIC) 44504.994
Sample-size adjusted Bayesian (SABIC) 44470.045
Root Mean Square Error of Approximation:
RMSEA 0.221
90 Percent confidence interval - lower 0.205
90 Percent confidence interval - upper 0.238
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.109
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
lv2 =~
x4 1.000
x3 0.976 NA
lv1 =~
x2 1.000
x1 0.642 NA
Regressions:
Estimate Std.Err z-value P(>|z|)
net.dists ~
lv1 -0.000 NA
lv2 ~
net.dists -0.000 NA
Variances:
Estimate Std.Err z-value P(>|z|)
.x4 2.856 NA
.x3 1.501 NA
.x2 1.722 NA
.x1 2.490 NA
.net.dists 0.553 NA
.lv2 1.315 NA
lv1 0.715 NA
The LSM output:
==========================
Summary of model fit
==========================
Formula: network::network(data$network[[latent.network[i]]]) ~ euclidean(d = latent.dim)
<environment: 0x7fc473af4960>
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) -0.67923 -0.83587 -0.5504 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Overall BIC: 2244.546
Likelihood BIC: 2184.507
Latent space/clustering BIC: 60.03918
Covariate coefficients MKL:
Estimate
(Intercept) -1.117408
Empirical Data Example
When embedding the LSM into the edge-based approach, one thing that needs to be considered is whether to model covariates predicting the social networks in the LSM framework or in the SEM framework. This is only a concern in the edge-based model since covariates need to be edge-based as well if using the LSM method, and it defies the purpose of simplicity if we consider the LSM in the actor-based approach. In this example, we will accommodate the covariates in the LSM framework within the edge-based approach. The dataset used in this example is the Florentine marriage dataset. The model is quite simple as shown below. Essentially, the observed marriage network is hypothesized to be based not only on the latent positions, but also on the non-network variable of wealth. Additionally, priorates is viewed as a predictor of the distance between latent positrons of the marriage networks.
load("data/flomarriage.RData")
network <- list()
network$flo <- flomarriage.network
nonnetwork <- flomarriage.nonnetwork
model <- '
flo ~ wealth
priorates ~ flo + wealth
'
When fitting the model using the sem.net.edge.lsm
function, the argument type
and latent.dim
are needed. Here, although the marriage network contains binary edges, the ordered argument is not needed since only the continuous latent distances will be used in the SEM.
data = list(network=network, nonnetwork=nonnetwork)
set.seed(100)
res <- sem.net.edge.lsm(model=model,data=data, type = "difference", latent.dim = 2, netstats.rescale = T, data.rescale = T)
## results
summary(res)
In this model, the latentnet
package is first used to estimate the LSM with the covariate of wealth. Then, the resulting latent positions of the marriage network, taking apart the effect of wealth, is hypothesized to be influenced by priorates and the effect is estimated through lavaan
. Thus, the latent distances of the marriage network acts like a mediator between priorates and the observed network. The resulting estimates from both the SEM component and the LSM component are shown below.
Model Fit InformationSEM Test statistics: 0 on 0 df with p-value: NA
network 1 LSM BIC: 259.7975
========================================
========================================
The SEM output:
lavaan 0.6.15 ended normally after 6 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Number of observations 256
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 50.126
Degrees of freedom 3
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -700.431
Loglikelihood unrestricted model (H1) -700.431
Akaike (AIC) 1410.863
Bayesian (BIC) 1428.589
Sample-size adjusted Bayesian (SABIC) 1412.737
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value H_0: RMSEA <= 0.050 NA
P-value H_0: RMSEA >= 0.080 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
priorates ~
wealth 0.422 0.057 7.441 0.000
flo.dists ~
wealth 0.000 0.063 0.000 1.000
priorates ~
flo.dists -0.000 0.057 -0.000 1.000
Variances:
Estimate Std.Err z-value P(>|z|)
.priorates 0.819 0.072 11.314 0.000
.flo.dists 0.996 0.088 11.314 0.000
The LSM output:
==========================
Summary of model fit
==========================
Formula: network::network(data$network[[latent.network[i]]]) ~ euclidean(d = latent.dim)
<environment: 0x7fc434ed5160>
Attribute: edges
Model: Bernoulli
MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
Covariate coefficients posterior means:
Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
(Intercept) 5.0133 2.5627 7.9665 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Overall BIC: 259.7975
Likelihood BIC: 85.53086
Latent space/clustering BIC: 174.2666
Covariate coefficients MKL:
Estimate
(Intercept) 2.861026
To look at indirect effects, the following code can be used.
> path.networksem(res, "wealth","flo.dists", "priorates")
predictor mediator outcome apath bpath indirect
1 wealth flo.dists priorates 2.976241e-21 -4.047181e-22 -1.204539e-42
indirect_se indirect_z
1 1.874237e-22 -6.42682e-21
The model is shown in this diagram below.
Use of Web App for SEM with Networks
The network data analysis can also be conducted using our online app available at: https://bigsem.psychstat.org/app . To use the app, one need to register as a user to protect the data of the users. Once logging in, a user with work with an interface like below:
Organizing data
Organizing the data for analysis is the first step for using the app or R package. In R, the data are provided as a list with a non-network component and a network component. To conveniently organize the data online, we developed a simple app.
To use the app, one first upload the non-network data and network data sets as separate files. Then, in the app, one selects the corresponding data files. An example is given below with two networks - friendship and WeChat networks. Note that the new data set will be saved as R data with the provided name, i.e., mynetworkdata.RData
in this example.
Conducting the analysis
We use a simple example to illustrate the use of the online app. To conduct the analysis, we need to first draw the path diagram of the model. Here, we create a latent happiness factor (happy.f) from the 4-item measure of global subjective happiness. We then use the friendship network to predict the happiness factor.
For the network analysis, one needs to choose the software to use, here "NetworkSEM". Then, one selects the Data File "network.RData".
For the network statistics based method, one need to choose what statistics to use. Here, one can specify them in the "Control" field. In this example, we use netstats = degree, betweenness, closeness
to allow the use of the three network statistics.
To run the analysis, one clicks on the green triangle in the left panel. The output of the analysis is given below. The output has several parts:
- The basic information, particularly, the user and the analysis id
7cf61d4792351966add082d56368301d
. - The descriptive statistics for numerical variables in the non-network data set.
- The information on the networks.
- The basic model information
- The results from fitting the model.
BigSEM started at 15:36:50 on Oct 22, 2024.
=====================================
Please refresh your browser for complete output of complex data analysis.
The current analysis was conducted by the BigSEM user johnny.
To contact us, make sure to include the ticket # for this analysis 7cf61d4792351966add082d56368301d
Descriptive statistics (N=165, p=59)
Mean sd Min Max Skewness Kurtosis gender 0.55152 0.49885 0.000 1.0000 -0.2071631 1.0429 gpa 3.27293 0.48805 1.173 4.2200 -0.6399076 4.2619 age 21.64242 0.85505 18.000 24.0000 -0.1255522 4.5903 weight 62.29091 14.16756 37.000 110.0000 0.9021334 3.2265 height 169.54545 8.15808 155.000 188.0000 0.3186553 1.9660 smoke 0.26061 0.44030 0.000 1.0000 1.0907192 2.1897 drink 0.41212 0.49372 0.000 1.0000 0.3570735 1.1275 wechat 157.32927 180.36548 0.000 1000.0000 2.9199355 11.9943 id 83.00000 47.77552 1.000 165.0000 0.0000000 1.7999 personality1 2.81818 1.06652 1.000 5.0000 -0.0869982 2.4384 personality2 2.61818 1.22710 1.000 5.0000 0.3212422 2.0339 personality3 2.45455 0.98436 1.000 5.0000 0.4540597 2.8503 personality4 2.64242 0.98743 1.000 5.0000 0.1910639 2.5725 personality5 3.03636 1.15764 1.000 5.0000 -0.0235915 2.2242 personality6 3.07879 1.12612 1.000 5.0000 0.1017642 2.3871 personality7 3.27273 1.16537 1.000 5.0000 -0.1954555 2.1881 personality8 2.36970 1.13816 1.000 5.0000 0.5103888 2.4850 personality9 2.75758 0.94451 1.000 5.0000 0.3684034 3.1224 personality10 3.01212 1.08194 1.000 5.0000 0.0049198 2.5241 personality11 2.89697 1.20276 1.000 5.0000 0.0931915 2.2009 personality12 3.78788 1.08081 1.000 5.0000 -0.4433181 2.2537 personality13 2.61818 1.03283 1.000 5.0000 0.3473757 2.9438 personality14 3.80000 1.04298 1.000 5.0000 -0.5964333 2.8276 personality15 3.42424 1.11613 1.000 5.0000 -0.3898210 2.5711 personality16 2.65455 1.20292 1.000 5.0000 0.2450516 2.2534 personality17 2.31515 0.98033 1.000 5.0000 0.3493841 2.6210 personality18 3.59394 0.99937 1.000 5.0000 -0.1128832 2.1067 personality19 3.82424 0.94966 1.000 5.0000 -0.5435870 3.1673 personality20 3.12121 1.06946 1.000 5.0000 0.0874853 2.4055 depress1 0.98788 0.55202 0.000 3.0000 0.6478164 5.7357 depress2 0.61818 0.58926 0.000 3.0000 0.5205043 3.3723 depress3 0.76364 0.78002 0.000 3.0000 0.8239322 3.2396 depress4 0.91515 0.59884 0.000 3.0000 0.3722678 4.0971 depress5 0.70303 0.67376 0.000 3.0000 0.6728525 3.3429 depress6 0.80606 0.69753 0.000 3.0000 0.7141707 3.7965 depress7 0.66667 0.70998 0.000 3.0000 0.8848909 3.5949 lone1 1.04848 0.77935 0.000 3.0000 0.2260045 2.3813 lone2 1.26667 0.88437 0.000 3.0000 0.1437581 2.2374 lone3 1.03030 0.87251 0.000 3.0000 0.2729773 2.0401 lone4 1.29091 0.90404 0.000 3.0000 0.1403947 2.1952 lone5 1.27879 0.88750 0.000 3.0000 0.0558801 2.1521 lone6 0.85455 0.79828 0.000 3.0000 0.5543989 2.5604 lone7 0.98788 0.85531 0.000 3.0000 0.3749858 2.2210 lone8 1.64242 0.89682 0.000 3.0000 -0.2540419 2.3354 lone9 1.00000 0.86954 0.000 3.0000 0.3907138 2.2320 lone10 0.88485 0.76832 0.000 3.0000 0.5218129 2.7655 happy1 5.34545 1.31897 1.000 7.0000 -0.8142547 3.6334 happy2 5.25455 1.30969 1.000 7.0000 -0.7392627 3.2077 happy3 5.24848 1.30387 2.000 7.0000 -0.4342157 2.6097 happy4 3.89091 1.65654 1.000 7.0000 0.1177261 2.2404 lone 1.12848 0.56674 0.000 2.6000 -0.0868936 2.8135 depress 0.78009 0.41754 0.000 1.8571 0.1401042 2.5266 happy 4.93485 0.86774 2.500 7.0000 0.2112938 3.2653 p.e 2.91364 0.78605 1.000 5.0000 0.1731648 3.4108 p.c 3.53182 0.69743 2.000 5.0000 0.2454618 2.4799 p.i 3.53788 0.68721 1.500 5.0000 -0.2099051 2.6462 p.a 3.55606 0.61259 1.750 5.0000 0.0235716 2.8378 p.n 2.87576 0.63835 1.000 4.7500 0.1728206 3.3815 bmi 21.50942 3.84812 15.401 39.5197 1.5035276 6.1558 Missing Rate gender 0.0000000 gpa 0.0000000 age 0.0000000 weight 0.0000000 height 0.0000000 smoke 0.0000000 drink 0.0000000 wechat 0.0060606 id 0.0000000 personality1 0.0000000 personality2 0.0000000 personality3 0.0000000 personality4 0.0000000 personality5 0.0000000 personality6 0.0000000 personality7 0.0000000 personality8 0.0000000 personality9 0.0000000 personality10 0.0000000 personality11 0.0000000 personality12 0.0000000 personality13 0.0000000 personality14 0.0000000 personality15 0.0000000 personality16 0.0000000 personality17 0.0000000 personality18 0.0000000 personality19 0.0000000 personality20 0.0000000 depress1 0.0000000 depress2 0.0000000 depress3 0.0000000 depress4 0.0000000 depress5 0.0000000 depress6 0.0000000 depress7 0.0000000 lone1 0.0000000 lone2 0.0000000 lone3 0.0000000 lone4 0.0000000 lone5 0.0000000 lone6 0.0000000 lone7 0.0000000 lone8 0.0000000 lone9 0.0000000 lone10 0.0000000 happy1 0.0000000 happy2 0.0000000 happy3 0.0000000 happy4 0.0000000 lone 0.0000000 depress 0.0000000 happy 0.0000000 p.e 0.0000000 p.c 0.0000000 p.i 0.0000000 p.a 0.0000000 p.n 0.0000000 bmi 0.0000000
Network data information
#row #col friends 165 165 wechat 165 165
Model information
Observed non-network variables: happy1 happy2 happy3 happy4 .
Observed network variables: friends .
Latent variables: happy.f .
The weight is: 0 .
Results
lavaan 0.6-18 ended normally after 66 iterations Estimator ML Optimization method NLMINB Number of model parameters 11 Number of observations 165 Model Test User Model: Test statistic 14.749 Degrees of freedom 11 P-value (Chi-square) 0.194 Model Test Baseline Model: Test statistic 162.858 Degrees of freedom 18 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 0.974 Tucker-Lewis Index (TLI) 0.958 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -1077.697 Loglikelihood unrestricted model (H1) -1070.322 Akaike (AIC) 2177.394 Bayesian (BIC) 2211.559 Sample-size adjusted Bayesian (SABIC) 2176.733 Root Mean Square Error of Approximation: RMSEA 0.045 90 Percent confidence interval - lower 0.000 90 Percent confidence interval - upper 0.099 P-value H_0: RMSEA <= 0.050 0.498 P-value H_0: RMSEA >= 0.080 0.170 Standardized Root Mean Square Residual: SRMR 0.039 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Latent Variables: Estimate Std.Err z-value P(>|z|) happy.f =~ happy4 1.000 happy3 -4.933 5.032 -0.980 0.327 happy2 -7.445 7.547 -0.986 0.324 happy1 -8.133 8.251 -0.986 0.324 Regressions: Estimate Std.Err z-value P(>|z|) happy.f ~ friends.degree -0.024 0.037 -0.655 0.513 frinds.btwnnss 0.019 0.029 0.654 0.513 friends.clsnss 0.011 0.027 0.401 0.689 Variances: Estimate Std.Err z-value P(>|z|) .happy4 2.708 0.299 9.070 0.000 .happy3 1.219 0.147 8.306 0.000 .happy2 0.633 0.150 4.207 0.000 .happy1 0.450 0.167 2.701 0.007 .happy.f 0.019 0.039 0.494 0.621
=====================================
BigSEM ended at 15:36:50 on Oct 22, 2024