Ego-centric Network Analysis

Descriptive Statistics and Sociodemographic Distance in ELSOC 2017-2019

R
Networks
Ego networks
Homophily
Author

Roberto Cantillan

Published

August 9, 2023

Introduction

This document provides a detailed analysis of ego-centric networks using data from ELSOC 2017 (w2) and 2019 (w4). We build a database that integrates information about egos (respondents) and their alters (named contacts), and then calculate measures of sociodemographic distance between them.

Figure 1: Egonetwork

Environment Setup

We begin by loading the required libraries. The following code uses pacman::p_load(), which installs packages if they are not already available and then loads them. Each package serves a specific purpose:

#| label: setup
#| message: false
#| warning: false

# Carga directa (rápida)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sjlabelled)

Attaching package: 'sjlabelled'

The following object is masked from 'package:forcats':

    as_factor

The following object is masked from 'package:dplyr':

    as_label

The following object is masked from 'package:ggplot2':

    as_label
library(sjPlot)
library(texreg)
Version:  1.39.4
Date:     2024-07-23
Author:   Philip Leifeld (University of Manchester)

Consider submitting praise using the praise or praise_interactive functions.
Please cite the JSS article in your publications -- see citation("texreg").

Attaching package: 'texreg'

The following object is masked from 'package:tidyr':

    extract
library(egor)
library(haven)

Attaching package: 'haven'

The following objects are masked from 'package:sjlabelled':

    as_factor, read_sas, read_spss, read_stata, write_sas, zap_labels
library(car)
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some
library(stargazer)

Please cite as: 

 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(gridExtra)

Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine
library(ggeffects)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(questionr)
library(JWileymisc)
library(httr)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:httr':

    config

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Data Loading

Next, we load the ELSOC datasets. These files contain information from the 2017 and 2019 waves. The .RData format preserves the original data structure, including labels and metadata:

#ELSOC 2017
url <- "https://github.com/rcantillan/ricantillan.rbind.io/raw/main/dat/ELSOC/ELSOC_W02_v3.00_R.RData"
response <- GET(url)
local_path <- "ELSOC_W02_v3.00_R.RData"
writeBin(response$content, local_path)
load("ELSOC_W02_v3.00_R.RData") 

Data Preparation

This step is essential for network analysis. We use rename() to standardize the identifier variable across both datasets. The ‘.’ prefix in .egoID follows the conventions of the egor package for identifier variables:

a <- elsoc_2017 %>% rename(.egoID = idencuesta)

Building the Alters Database

We now turn to the most complex part: constructing the alters database. For each alter (contact mentioned by the ego), we need to extract multiple characteristics.

First Alter

The following code extracts information for the first alter mentioned by each ego. The selected variables capture:

  • alter_sexo: Gender of the contact (1 = male, 2 = female)
  • alter_edad: Age in years
  • alter_rel: Type of relationship with the ego (1 = family, 2 = friend, etc.)
  • alter_tiempo: Length of the relationship
  • alter_barrio: Whether the alter lives in the same neighborhood as the ego
  • alter_educ: Educational attainment
  • alter_relig: Religious affiliation
  • alter_ideol: Political orientation
alter_1 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_01,     # Recodificamos nombre de variable
                      alter_edad=r13_edad_01,      # para mantener consistencia
                      alter_rel=r13_relacion_01,   # Los sufijos _01 indican
                      alter_tiempo=r13_tiempo_01,  # que corresponden al
                      alter_barrio=r13_barrio_01,  # primer alter mencionado
                      alter_educ=r13_educ_01,      
                      alter_relig=r13_relig_01,    
                      alter_ideol=r13_ideol_01)

Remaining Alters

We repeat the same process for alters 2 through 5. The only difference is the suffix in the original variable names (_02, _03, etc.). The code is repetitive but necessary to keep the data structure clear:

# Segundo alter - sufijo _02 en variables originales
alter_2 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_02, 
                      alter_edad=r13_edad_02, 
                      alter_rel=r13_relacion_02,
                      alter_tiempo=r13_tiempo_02,
                      alter_barrio=r13_barrio_02, 
                      alter_educ=r13_educ_02, 
                      alter_relig=r13_relig_02, 
                      alter_ideol=r13_ideol_02)

# Tercer alter - sufijo _03
alter_3 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_03, 
                      alter_edad=r13_edad_03, 
                      alter_rel=r13_relacion_03,
                      alter_tiempo=r13_tiempo_03,
                      alter_barrio=r13_barrio_03, 
                      alter_educ=r13_educ_03, 
                      alter_relig=r13_relig_03, 
                      alter_ideol=r13_ideol_03)

# Cuarto alter - sufijo _04
alter_4 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_04, 
                      alter_edad=r13_edad_04, 
                      alter_rel=r13_relacion_04,
                      alter_tiempo=r13_tiempo_04, 
                      alter_barrio=r13_barrio_04, 
                      alter_educ=r13_educ_04, 
                      alter_relig=r13_relig_04, 
                      alter_ideol=r13_ideol_04)

# Quinto alter - sufijo _05
alter_5 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_05, 
                      alter_edad=r13_edad_05, 
                      alter_rel=r13_relacion_05,
                      alter_tiempo=r13_tiempo_05, 
                      alter_barrio=r13_barrio_05, 
                      alter_educ=r13_educ_05, 
                      alter_relig=r13_relig_05, 
                      alter_ideol=r13_ideol_05)

Identifying Alters

To preserve the order in which alters are stored, we create a numeric variable that captures the position in which each alter was mentioned. This is crucial for later analyses that consider mention order:

# Assign sequential identifier numbers
alter_1$n <- 1  # First alter mentioned
alter_2$n <- 2  # Second alter
alter_3$n <- 3  # Third alter
alter_4$n <- 4  # Fourth alter
alter_5$n <- 5  # Fifth alter

Building the Long-format Dataset

We then merge all alter information into a single long-format dataset. This step is crucial because it:

  1. Enables more efficient analysis
  2. Makes comparisons across alters easier
  3. Is the preferred structure for many network-analysis functions
# Combine all alter datasets
alteris <- rbind(alter_1, alter_2, alter_3, alter_4, alter_5)

# Order by ego ID to keep the hierarchical structure
alteris <- arrange(alteris, .egoID)

# Create a unique identifier for each alter
alteris <- rowid_to_column(alteris, var = ".altID")

# Convert to tibble for easier manipulation
alteris <- as_tibble(alteris)

Variable Recoding

Recoding Alter Attributes

Next, we transform the alters’ categorical variables to make the analysis easier. We use factor to convert variables into factors and Recode from the car package to relabel the values. This recoding is crucial to:

  1. Simplify categories
  2. Handle missing values
  3. Create interpretable groupings
# Recode educational level
# 1 = primary, 2 = secondary, 3 = vocational, 4 = university
alteris$alter_educ <- factor(Recode(alteris$alter_educ,
                                   "1=1;2:3=2;4=3;5=4;-999=NA"))

# Recode religion
# 1-5 represent different religious affiliations
alteris$alter_relig <- factor(Recode(alteris$alter_relig,
                                    "1=1;2=2;3=3;4=4;5=5;-999=NA"))

# Recode political ideology
# 1-6 represent the political spectrum from left to right
alteris$alter_ideol <- factor(Recode(alteris$alter_ideol,
                                    "1=1;2=2;3=3;4=4;5=5;6=6;-999=NA"))

# Recode age into age groups
# 1 = 0-18, 2 = 19-29, 3 = 30-40, 4 = 41-51, 5 = 52-62, 6 = 63+
alteris$alter_edad <- factor(Recode(alteris$alter_edad,
                                   "0:18=1;19:29=2;30:40=3;41:51=4;52:62=5;63:100=6"))

# Recode sex (1 = male, 2 = female)
alteris$alter_sexo <- factor(Recode(alteris$alter_sexo,
                                   "1=1;2=2"))

Preparing Ego Data

We now create a dataset with information about the egos (respondents). We select variables that mirror the alter attributes so we can make direct comparisons:

# Seleccionamos y renombramos variables de ego
egos <- a %>%
       dplyr::select(.egoID,                # Respondent ID
                     ego_sexo = m0_sexo,    # Respondent sex
                     ego_edad = m0_edad,    # Respondent age
                     ego_ideol = c15,       # Political ideology
                     ego_educ = m01,        # Educational attainment
                     ego_relig = m38)       # Religion

# Convert to tibble for easier manipulation
egos <- as_tibble(egos)

Recoding Ego Variables

As with the alters, we recode ego variables to ensure comparability:

# Recode educational level
# Group into four levels: primary, secondary, vocational, university
egos$ego_educ <- factor(Recode(egos$ego_educ,
                              "1:3=1;4:5=2;6:7=3;8:10=4;-999:-888=NA"))

# Recode religion
# Group into five main categories
egos$ego_relig <- factor(Recode(egos$ego_relig,
                               "1=1;2=2;9=3;7:8=4;3:6=5;-999:-888=NA"))

# Recode political ideology
# Scale from 1-6 where 1 = far left and 6 = far right
egos$ego_ideol <- factor(Recode(egos$ego_ideol,
                               "9:10=1;6:8=2;5=3;2:4=4;0:1=5;11:12=6;-999:-888=NA"))

# Recode age into the same groups used for alters
egos$ego_edad <- factor(Recode(egos$ego_edad,
                              "18=1;19:29=2;30:40=3;41:51=4;52:62=5;63:100=6"))

# Recode sex (1 = male, 2 = female)
egos$ego_sexo <- factor(Recode(egos$ego_sexo,
                              "1=1;2=2"))

Joining Ego and Alter Data

We combine the ego and alter datasets with a left join, which keeps all alters and their corresponding egos:

# Join the datasets using the ego ID as the key
obs <- left_join(alteris, egos, by = ".egoID")

# Create a case indicator
obs$case <- 1

# Recode missing values
obs[obs == "-999"] <- NA
obs[obs == "-888"] <- NA

Descriptive Analysis

Descriptives (alter)

We examine the frequency distribution of the alter attributes.

kbl(freq(obs$alter_educ)) %>%kable_paper()
n % val%
1 1362 11.0 17.8
2 3543 28.7 46.2
3 1042 8.4 13.6
4 1719 13.9 22.4
NA 4699 38.0 NA
kbl(freq(obs$alter_relig))%>%kable_paper()
n % val%
1 4907 39.7 60.8
2 1407 11.4 17.4
3 1128 9.1 14.0
4 251 2.0 3.1
5 373 3.0 4.6
NA 4299 34.8 NA
kbl(freq(obs$alter_ideol))%>%kable_paper()
n % val%
1 786 6.4 11.1
2 191 1.5 2.7
3 382 3.1 5.4
4 303 2.5 4.3
5 759 6.1 10.7
6 4644 37.6 65.7
NA 5300 42.9 NA
kbl(freq(obs$alter_edad)) %>%kable_paper()
n % val%
1 349 2.8 4.3
2 1477 11.9 18.3
3 1875 15.2 23.2
4 1713 13.9 21.2
5 1466 11.9 18.2
6 1186 9.6 14.7
NA 4299 34.8 NA
kbl(freq(obs$alter_sexo)) %>%kable_paper()
n % val%
1 3388 27.4 42
2 4678 37.8 58
NA 4299 34.8 NA

Descriptives (ego)

We review the frequency distribution of the egos’ sociodemographic attributes.

kbl(freq(obs$ego_educ)) %>%kable_paper()
n % val%
2985 24.1 24.1
5225 42.3 42.3
2010 16.3 16.3
2145 17.3 17.3
kbl(freq(obs$ego_relig))%>%kable_paper()
n % val%
1 6915 55.9 56.1
2 2495 20.2 20.2
3 1055 8.5 8.6
4 485 3.9 3.9
5 1380 11.2 11.2
NA 35 0.3 NA
kbl(freq(obs$ego_ideol))%>%kable_paper()
n % val%
1 915 7.4 7.5
2 1090 8.8 8.9
3 2350 19.0 19.2
4 1380 11.2 11.3
5 1075 8.7 8.8
6 5400 43.7 44.2
NA 155 1.3 NA
kbl(freq(obs$ego_edad)) %>%kable_paper()
n % val%
10 0.1 0.1
1865 15.1 15.1
2530 20.5 20.5
2720 22.0 22.0
2870 23.2 23.2
2370 19.2 19.2
kbl(freq(obs$ego_sexo)) %>%kable_paper()
n % val%
4755 38.5 38.5
7610 61.5 61.5

The tables show the frequency distribution for each sociodemographic characteristic. The columns represent: 1. The variable category 2. The absolute frequency (number of cases) 3. The percentage of the total

This information helps us understand the composition of the sample at both the ego and alter levels, and it will be essential for subsequent analyses of homophily and social distance.

Educational Homophily Analysis

We examine educational homophily with a cross-tabulation of the education levels of egos and alters. We first recode the labels to improve interpretation:

# Recode educational labels
obs <- obs %>%
  dplyr::mutate(
    ego_educ = case_when(
      ego_educ == 1 ~ "primary",
      ego_educ == 2 ~ "secondary",
      ego_educ == 3 ~ "vocational",
      TRUE ~ "university"
    )
  ) %>%
  dplyr::mutate(
    alter_educ = case_when(
      alter_educ == 1 ~ "primary",
      alter_educ == 2 ~ "secondary",
      alter_educ == 3 ~ "vocational",
      TRUE ~ "university"
    )
  )

# Creamos tabla cruzada con porcentajes
table_cont <- sjPlot::tab_xtab(
  var.row = obs$ego_educ,
  var.col = obs$alter_educ,
  title = "Educational Homophily in Personal Networks",
  show.row.prc = TRUE,
  show.summary = TRUE,
  show.col.prc = TRUE,
  use.viewer = FALSE
)
table_cont
Educational Homophily in Personal Networks
ego_educ alter_educ Total
primary secondary university vocational
primary 570
19.1 %
41.9 %
825
27.6 %
23.3 %
1501
50.3 %
23.4 %
89
3 %
8.5 %
2985
100 %
24.1 %
secondary 622
11.9 %
45.7 %
1881
36 %
53.1 %
2386
45.7 %
37.2 %
336
6.4 %
32.2 %
5225
100 %
42.3 %
university 54
2.5 %
4 %
356
16.6 %
10 %
1507
70.3 %
23.5 %
228
10.6 %
21.9 %
2145
100 %
17.3 %
vocational 116
5.8 %
8.5 %
481
23.9 %
13.6 %
1024
50.9 %
16 %
389
19.4 %
37.3 %
2010
100 %
16.3 %
Total 1362
11 %
100 %
3543
28.7 %
100 %
6418
51.9 %
100 %
1042
8.4 %
100 %
12365
100 %
100 %
χ2=1202.528 · df=9 · Cramer's V=0.180 · p=0.000

Visualizing Educational Homophily

We create a heatmap to show educational homophily patterns:

# Prepare data for the heatmap
table <- as.data.frame(prop.table(table(obs$ego_educ, obs$alter_educ)))
colnames(table) <- c("Ego_educ", "Alter_educ", "Prop")

# Format the proportions to display percentages with two decimals
table$tooltip_text <- sprintf(
  "Ego: %s<br>Alter: %s<br>Share: %.2f%%",
  table$Ego_educ,
  table$Alter_educ,
  table$Prop * 100
)

# Create the heatmap
p <- ggplot(table, aes(Ego_educ, Alter_educ)) +
  geom_tile(aes(fill = Prop, text = tooltip_text)) +  # Add tooltip text
  scale_fill_gradient(low = "white", high = "black") +
  theme_minimal() +
  labs(
    title = "Educational Homophily Heatmap",
    x = "Ego Educational Level",
    y = "Alter Educational Level",
    fill = "Proportion"
  )

# Convert to plotly and specify which variables to display in the tooltip
ggplotly(p, tooltip = "text")

Relationship Analysis by Type

We calculate average relationship duration by tie type:

obs %>%
  summarise(
    mean.clo.esp = mean(alter_tiempo[alter_rel == "1"], na.rm = TRUE),
    mean.clo.hijo = mean(alter_tiempo[alter_rel == "2"], na.rm = TRUE),
    mean.clo.pari = mean(alter_tiempo[alter_rel == "3"], na.rm = TRUE),
    mean.clo.amig = mean(alter_tiempo[alter_rel == "4"], na.rm = TRUE),
    mean.clo.otro = mean(alter_tiempo[alter_rel == "5"], na.rm = TRUE),
    count.par.barr = sum((alter_rel == "3" & alter_barrio == "1"), na.rm = TRUE)
  ) %>%
  kbl() %>%
  kable_paper()
mean.clo.esp mean.clo.hijo mean.clo.pari mean.clo.amig mean.clo.otro count.par.barr
4.498406 4.91511 4.877932 4.075097 3.67823 1215

Calculating Sociodemographic Distances

Creating Distance Vectors

We compute sociodemographic distance measures between each ego and alter. A value of 1 indicates a difference in the attribute:

# Calculate distances for each dimension
obs$sexo_dist <- ifelse(obs$alter_sexo == obs$ego_sexo, 0, 1)
obs$edad_dist <- ifelse(obs$alter_edad == obs$ego_edad, 0, 1)
obs$educ_dist <- ifelse(obs$alter_educ == obs$ego_educ, 0, 1)
obs$ideol_dist <- ifelse(obs$alter_ideol == obs$ego_ideol, 0, 1)
obs$relig_dist <- ifelse(obs$alter_relig == obs$ego_relig, 0, 1)

Distance Summary

Finally, we present a summary of the sociodemographic distances:

# Create a summary table of distances
kbl(egltable(
  c("sexo_dist", "edad_dist", "educ_dist", "ideol_dist", "relig_dist"),
  data = obs,
  strict = TRUE
)) %>%
  kable_paper() 
M (SD)
sexo_dist 0.36 (0.48)
edad_dist 0.65 (0.48)
educ_dist 0.65 (0.48)
ideol_dist 0.55 (0.50)
relig_dist 0.38 (0.49)

This final table shows the percentage of ego–alter dyads that differ in each sociodemographic dimension, allowing us to assess homophily patterns across characteristics.

References

  • Bargsted Valdés, M. A., Espinoza, V., & Plaza, A. (2020). Homophily Patterns in Chile. Papers. Revista de Sociologia, 105(4), 583. https://doi.org/10.5565/rev/papers.2617

  • Smith, J. A., McPherson, M., & Smith-Lovin, L. (2014). Social Distance in the United States: Sex, Race, Religion, Age, and Education Homophily among Confidants, 1985 to 2004. American Sociological Review, 79(3), 432–456. https://doi.org/10.1177/0003122414531776