Introduction

This document provides a detailed analysis of ego-centric networks using data from ELSOC 2017 (w2) and 2019 (w4). We build a database that integrates information about egos (respondents) and their alters (named contacts), and then calculate measures of sociodemographic distance between them.

Environment Setup

We begin by loading the required libraries. The following code uses pacman::p_load(), which installs packages if they are not already available and then loads them. Each package serves a specific purpose:

#| label: setup
#| message: false
#| warning: false

# Carga directa (rápida)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(sjlabelled)


Attaching package: 'sjlabelled'

The following object is masked from 'package:forcats':

    as_factor

The following object is masked from 'package:dplyr':

    as_label

The following object is masked from 'package:ggplot2':

    as_label

library(sjPlot)
library(texreg)

Version:  1.39.4
Date:     2024-07-23
Author:   Philip Leifeld (University of Manchester)

Consider submitting praise using the praise or praise_interactive functions.
Please cite the JSS article in your publications -- see citation("texreg").

Attaching package: 'texreg'

The following object is masked from 'package:tidyr':

    extract

library(egor)
library(haven)


Attaching package: 'haven'

The following objects are masked from 'package:sjlabelled':

    as_factor, read_sas, read_spss, read_stata, write_sas, zap_labels

library(car)

Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

library(stargazer)


Please cite as: 

 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

library(janitor)


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

library(gridExtra)


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

library(ggeffects)
library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

library(questionr)
library(JWileymisc)
library(httr)
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:httr':

    config

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Data Loading

Next, we load the ELSOC datasets. These files contain information from the 2017 and 2019 waves. The .RData format preserves the original data structure, including labels and metadata:

#ELSOC 2017
url <- "https://github.com/rcantillan/ricantillan.rbind.io/raw/main/dat/ELSOC/ELSOC_W02_v3.00_R.RData"
response <- GET(url)
local_path <- "ELSOC_W02_v3.00_R.RData"
writeBin(response$content, local_path)
load("ELSOC_W02_v3.00_R.RData")

Data Preparation

This step is essential for network analysis. We use rename() to standardize the identifier variable across both datasets. The ‘.’ prefix in .egoID follows the conventions of the egor package for identifier variables:

a <- elsoc_2017 %>% rename(.egoID = idencuesta)

Building the Alters Database

We now turn to the most complex part: constructing the alters database. For each alter (contact mentioned by the ego), we need to extract multiple characteristics.

First Alter

The following code extracts information for the first alter mentioned by each ego. The selected variables capture:

alter_sexo: Gender of the contact (1 = male, 2 = female)
alter_edad: Age in years
alter_rel: Type of relationship with the ego (1 = family, 2 = friend, etc.)
alter_tiempo: Length of the relationship
alter_barrio: Whether the alter lives in the same neighborhood as the ego
alter_educ: Educational attainment
alter_relig: Religious affiliation
alter_ideol: Political orientation

alter_1 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_01,     # Recodificamos nombre de variable
                      alter_edad=r13_edad_01,      # para mantener consistencia
                      alter_rel=r13_relacion_01,   # Los sufijos _01 indican
                      alter_tiempo=r13_tiempo_01,  # que corresponden al
                      alter_barrio=r13_barrio_01,  # primer alter mencionado
                      alter_educ=r13_educ_01,      
                      alter_relig=r13_relig_01,    
                      alter_ideol=r13_ideol_01)

Remaining Alters

We repeat the same process for alters 2 through 5. The only difference is the suffix in the original variable names (_02, _03, etc.). The code is repetitive but necessary to keep the data structure clear:

# Segundo alter - sufijo _02 en variables originales
alter_2 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_02, 
                      alter_edad=r13_edad_02, 
                      alter_rel=r13_relacion_02,
                      alter_tiempo=r13_tiempo_02,
                      alter_barrio=r13_barrio_02, 
                      alter_educ=r13_educ_02, 
                      alter_relig=r13_relig_02, 
                      alter_ideol=r13_ideol_02)

# Tercer alter - sufijo _03
alter_3 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_03, 
                      alter_edad=r13_edad_03, 
                      alter_rel=r13_relacion_03,
                      alter_tiempo=r13_tiempo_03,
                      alter_barrio=r13_barrio_03, 
                      alter_educ=r13_educ_03, 
                      alter_relig=r13_relig_03, 
                      alter_ideol=r13_ideol_03)

# Cuarto alter - sufijo _04
alter_4 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_04, 
                      alter_edad=r13_edad_04, 
                      alter_rel=r13_relacion_04,
                      alter_tiempo=r13_tiempo_04, 
                      alter_barrio=r13_barrio_04, 
                      alter_educ=r13_educ_04, 
                      alter_relig=r13_relig_04, 
                      alter_ideol=r13_ideol_04)

# Quinto alter - sufijo _05
alter_5 <- a %>%
        dplyr::select(.egoID, 
                      alter_sexo=r13_sexo_05, 
                      alter_edad=r13_edad_05, 
                      alter_rel=r13_relacion_05,
                      alter_tiempo=r13_tiempo_05, 
                      alter_barrio=r13_barrio_05, 
                      alter_educ=r13_educ_05, 
                      alter_relig=r13_relig_05, 
                      alter_ideol=r13_ideol_05)

Identifying Alters

To preserve the order in which alters are stored, we create a numeric variable that captures the position in which each alter was mentioned. This is crucial for later analyses that consider mention order:

# Assign sequential identifier numbers
alter_1$n <- 1  # First alter mentioned
alter_2$n <- 2  # Second alter
alter_3$n <- 3  # Third alter
alter_4$n <- 4  # Fourth alter
alter_5$n <- 5  # Fifth alter

Building the Long-format Dataset

We then merge all alter information into a single long-format dataset. This step is crucial because it:

Enables more efficient analysis
Makes comparisons across alters easier
Is the preferred structure for many network-analysis functions

# Combine all alter datasets
alteris <- rbind(alter_1, alter_2, alter_3, alter_4, alter_5)

# Order by ego ID to keep the hierarchical structure
alteris <- arrange(alteris, .egoID)

# Create a unique identifier for each alter
alteris <- rowid_to_column(alteris, var = ".altID")

# Convert to tibble for easier manipulation
alteris <- as_tibble(alteris)

Variable Recoding

Recoding Alter Attributes

Next, we transform the alters’ categorical variables to make the analysis easier. We use factor to convert variables into factors and Recode from the car package to relabel the values. This recoding is crucial to:

Simplify categories
Handle missing values
Create interpretable groupings

# Recode educational level
# 1 = primary, 2 = secondary, 3 = vocational, 4 = university
alteris$alter_educ <- factor(Recode(alteris$alter_educ,
                                   "1=1;2:3=2;4=3;5=4;-999=NA"))

# Recode religion
# 1-5 represent different religious affiliations
alteris$alter_relig <- factor(Recode(alteris$alter_relig,
                                    "1=1;2=2;3=3;4=4;5=5;-999=NA"))

# Recode political ideology
# 1-6 represent the political spectrum from left to right
alteris$alter_ideol <- factor(Recode(alteris$alter_ideol,
                                    "1=1;2=2;3=3;4=4;5=5;6=6;-999=NA"))

# Recode age into age groups
# 1 = 0-18, 2 = 19-29, 3 = 30-40, 4 = 41-51, 5 = 52-62, 6 = 63+
alteris$alter_edad <- factor(Recode(alteris$alter_edad,
                                   "0:18=1;19:29=2;30:40=3;41:51=4;52:62=5;63:100=6"))

# Recode sex (1 = male, 2 = female)
alteris$alter_sexo <- factor(Recode(alteris$alter_sexo,
                                   "1=1;2=2"))

Preparing Ego Data

We now create a dataset with information about the egos (respondents). We select variables that mirror the alter attributes so we can make direct comparisons:

# Seleccionamos y renombramos variables de ego
egos <- a %>%
       dplyr::select(.egoID,                # Respondent ID
                     ego_sexo = m0_sexo,    # Respondent sex
                     ego_edad = m0_edad,    # Respondent age
                     ego_ideol = c15,       # Political ideology
                     ego_educ = m01,        # Educational attainment
                     ego_relig = m38)       # Religion

# Convert to tibble for easier manipulation
egos <- as_tibble(egos)

Recoding Ego Variables

As with the alters, we recode ego variables to ensure comparability:

# Recode educational level
# Group into four levels: primary, secondary, vocational, university
egos$ego_educ <- factor(Recode(egos$ego_educ,
                              "1:3=1;4:5=2;6:7=3;8:10=4;-999:-888=NA"))

# Recode religion
# Group into five main categories
egos$ego_relig <- factor(Recode(egos$ego_relig,
                               "1=1;2=2;9=3;7:8=4;3:6=5;-999:-888=NA"))

# Recode political ideology
# Scale from 1-6 where 1 = far left and 6 = far right
egos$ego_ideol <- factor(Recode(egos$ego_ideol,
                               "9:10=1;6:8=2;5=3;2:4=4;0:1=5;11:12=6;-999:-888=NA"))

# Recode age into the same groups used for alters
egos$ego_edad <- factor(Recode(egos$ego_edad,
                              "18=1;19:29=2;30:40=3;41:51=4;52:62=5;63:100=6"))

# Recode sex (1 = male, 2 = female)
egos$ego_sexo <- factor(Recode(egos$ego_sexo,
                              "1=1;2=2"))

Joining Ego and Alter Data

We combine the ego and alter datasets with a left join, which keeps all alters and their corresponding egos:

# Join the datasets using the ego ID as the key
obs <- left_join(alteris, egos, by = ".egoID")

# Create a case indicator
obs$case <- 1

# Recode missing values
obs[obs == "-999"] <- NA
obs[obs == "-888"] <- NA

Descriptive Analysis

Descriptives (alter)

We examine the frequency distribution of the alter attributes.

kbl(freq(obs$alter_educ)) %>%kable_paper()

	n	%	val%
1	1362	11.0	17.8
2	3543	28.7	46.2
3	1042	8.4	13.6
4	1719	13.9	22.4
NA	4699	38.0	NA

kbl(freq(obs$alter_relig))%>%kable_paper()

	n	%	val%
1	4907	39.7	60.8
2	1407	11.4	17.4
3	1128	9.1	14.0
4	251	2.0	3.1
5	373	3.0	4.6
NA	4299	34.8	NA

kbl(freq(obs$alter_ideol))%>%kable_paper()

	n	%	val%
1	786	6.4	11.1
2	191	1.5	2.7
3	382	3.1	5.4
4	303	2.5	4.3
5	759	6.1	10.7
6	4644	37.6	65.7
NA	5300	42.9	NA

kbl(freq(obs$alter_edad)) %>%kable_paper()

	n	%	val%
1	349	2.8	4.3
2	1477	11.9	18.3
3	1875	15.2	23.2
4	1713	13.9	21.2
5	1466	11.9	18.2
6	1186	9.6	14.7
NA	4299	34.8	NA

kbl(freq(obs$alter_sexo)) %>%kable_paper()

	n	%	val%
1	3388	27.4	42
2	4678	37.8	58
NA	4299	34.8	NA

Descriptives (ego)

We review the frequency distribution of the egos’ sociodemographic attributes.

kbl(freq(obs$ego_educ)) %>%kable_paper()

n	%	val%
2985	24.1	24.1
5225	42.3	42.3
2010	16.3	16.3
2145	17.3	17.3

kbl(freq(obs$ego_relig))%>%kable_paper()

	n	%	val%
1	6915	55.9	56.1
2	2495	20.2	20.2
3	1055	8.5	8.6
4	485	3.9	3.9
5	1380	11.2	11.2
NA	35	0.3	NA

kbl(freq(obs$ego_ideol))%>%kable_paper()

	n	%	val%
1	915	7.4	7.5
2	1090	8.8	8.9
3	2350	19.0	19.2
4	1380	11.2	11.3
5	1075	8.7	8.8
6	5400	43.7	44.2
NA	155	1.3	NA

kbl(freq(obs$ego_edad)) %>%kable_paper()

n	%	val%
10	0.1	0.1
1865	15.1	15.1
2530	20.5	20.5
2720	22.0	22.0
2870	23.2	23.2
2370	19.2	19.2

kbl(freq(obs$ego_sexo)) %>%kable_paper()

n	%	val%
4755	38.5	38.5
7610	61.5	61.5

The tables show the frequency distribution for each sociodemographic characteristic. The columns represent: 1. The variable category 2. The absolute frequency (number of cases) 3. The percentage of the total

This information helps us understand the composition of the sample at both the ego and alter levels, and it will be essential for subsequent analyses of homophily and social distance.

Educational Homophily Analysis

We examine educational homophily with a cross-tabulation of the education levels of egos and alters. We first recode the labels to improve interpretation:

# Recode educational labels
obs <- obs %>%
  dplyr::mutate(
    ego_educ = case_when(
      ego_educ == 1 ~ "primary",
      ego_educ == 2 ~ "secondary",
      ego_educ == 3 ~ "vocational",
      TRUE ~ "university"
    )
  ) %>%
  dplyr::mutate(
    alter_educ = case_when(
      alter_educ == 1 ~ "primary",
      alter_educ == 2 ~ "secondary",
      alter_educ == 3 ~ "vocational",
      TRUE ~ "university"
    )
  )

# Creamos tabla cruzada con porcentajes
table_cont <- sjPlot::tab_xtab(
  var.row = obs$ego_educ,
  var.col = obs$alter_educ,
  title = "Educational Homophily in Personal Networks",
  show.row.prc = TRUE,
  show.summary = TRUE,
  show.col.prc = TRUE,
  use.viewer = FALSE
)
table_cont

Educational Homophily in Personal Networks
ego_educ	alter_educ				Total
ego_educ	primary	secondary	university	vocational	Total
primary	570 19.1 % 41.9 %	825 27.6 % 23.3 %	1501 50.3 % 23.4 %	89 3 % 8.5 %	2985 100 % 24.1 %
secondary	622 11.9 % 45.7 %	1881 36 % 53.1 %	2386 45.7 % 37.2 %	336 6.4 % 32.2 %	5225 100 % 42.3 %
university	54 2.5 % 4 %	356 16.6 % 10 %	1507 70.3 % 23.5 %	228 10.6 % 21.9 %	2145 100 % 17.3 %
vocational	116 5.8 % 8.5 %	481 23.9 % 13.6 %	1024 50.9 % 16 %	389 19.4 % 37.3 %	2010 100 % 16.3 %
Total	1362 11 % 100 %	3543 28.7 % 100 %	6418 51.9 % 100 %	1042 8.4 % 100 %	12365 100 % 100 %
χ²=1202.528 · df=9 · Cramer's V=0.180 · p=0.000

Visualizing Educational Homophily

We create a heatmap to show educational homophily patterns:

# Prepare data for the heatmap
table <- as.data.frame(prop.table(table(obs$ego_educ, obs$alter_educ)))
colnames(table) <- c("Ego_educ", "Alter_educ", "Prop")

# Format the proportions to display percentages with two decimals
table$tooltip_text <- sprintf(
  "Ego: %s<br>Alter: %s<br>Share: %.2f%%",
  table$Ego_educ,
  table$Alter_educ,
  table$Prop * 100
)

# Create the heatmap
p <- ggplot(table, aes(Ego_educ, Alter_educ)) +
  geom_tile(aes(fill = Prop, text = tooltip_text)) +  # Add tooltip text
  scale_fill_gradient(low = "white", high = "black") +
  theme_minimal() +
  labs(
    title = "Educational Homophily Heatmap",
    x = "Ego Educational Level",
    y = "Alter Educational Level",
    fill = "Proportion"
  )

# Convert to plotly and specify which variables to display in the tooltip
ggplotly(p, tooltip = "text")

Relationship Analysis by Type

We calculate average relationship duration by tie type:

obs %>%
  summarise(
    mean.clo.esp = mean(alter_tiempo[alter_rel == "1"], na.rm = TRUE),
    mean.clo.hijo = mean(alter_tiempo[alter_rel == "2"], na.rm = TRUE),
    mean.clo.pari = mean(alter_tiempo[alter_rel == "3"], na.rm = TRUE),
    mean.clo.amig = mean(alter_tiempo[alter_rel == "4"], na.rm = TRUE),
    mean.clo.otro = mean(alter_tiempo[alter_rel == "5"], na.rm = TRUE),
    count.par.barr = sum((alter_rel == "3" & alter_barrio == "1"), na.rm = TRUE)
  ) %>%
  kbl() %>%
  kable_paper()

mean.clo.esp	mean.clo.hijo	mean.clo.pari	mean.clo.amig	mean.clo.otro	count.par.barr
4.498406	4.91511	4.877932	4.075097	3.67823	1215

Calculating Sociodemographic Distances

Creating Distance Vectors

We compute sociodemographic distance measures between each ego and alter. A value of 1 indicates a difference in the attribute:

# Calculate distances for each dimension
obs$sexo_dist <- ifelse(obs$alter_sexo == obs$ego_sexo, 0, 1)
obs$edad_dist <- ifelse(obs$alter_edad == obs$ego_edad, 0, 1)
obs$educ_dist <- ifelse(obs$alter_educ == obs$ego_educ, 0, 1)
obs$ideol_dist <- ifelse(obs$alter_ideol == obs$ego_ideol, 0, 1)
obs$relig_dist <- ifelse(obs$alter_relig == obs$ego_relig, 0, 1)

Distance Summary

Finally, we present a summary of the sociodemographic distances:

# Create a summary table of distances
kbl(egltable(
  c("sexo_dist", "edad_dist", "educ_dist", "ideol_dist", "relig_dist"),
  data = obs,
  strict = TRUE
)) %>%
  kable_paper()

	M (SD)
sexo_dist	0.36 (0.48)
edad_dist	0.65 (0.48)
educ_dist	0.65 (0.48)
ideol_dist	0.55 (0.50)
relig_dist	0.38 (0.49)

This final table shows the percentage of ego–alter dyads that differ in each sociodemographic dimension, allowing us to assess homophily patterns across characteristics.

References

Bargsted Valdés, M. A., Espinoza, V., & Plaza, A. (2020). Homophily Patterns in Chile. Papers. Revista de Sociologia, 105(4), 583. https://doi.org/10.5565/rev/papers.2617
Smith, J. A., McPherson, M., & Smith-Lovin, L. (2014). Social Distance in the United States: Sex, Race, Religion, Age, and Education Homophily among Confidants, 1985 to 2004. American Sociological Review, 79(3), 432–456. https://doi.org/10.1177/0003122414531776