suppressPackageStartupMessages({
library(kableExtra)
library(data.table)
library(ggplot2)
library(broom)
library(dplyr)
library(scales)
library(lmtest)
library(knitr)
library(patchwork)
library(tidyr)
library(ggrepel)
library(viridis)
library(RColorBrewer)
library(ggtext)
library(brms)
library(fixest) # For fixed effects models
})
# Custom theme for a professional, academic look
<- function(base_size = 15, base_family = "sans") {
theme_paper theme_minimal(base_size = base_size, base_family = base_family) +
theme(
text = element_text(color = "grey20"),
plot.title = element_text(size = rel(1.6), face = "bold", hjust = 0.5, margin = margin(b = 15), family = base_family),
plot.subtitle = element_markdown(size = rel(1.1), color = "grey40", hjust = 0.5, margin = margin(b = 15), lineheight = 1.2),
plot.caption = element_text(size = rel(0.9), color = "grey50", hjust = 1, margin = margin(t = 12)),
axis.title = element_text(size = rel(1.25), face = "bold", color = "grey30"),
axis.text = element_text(size = rel(1.1), color = "grey40"),
axis.line = element_line(color = "grey80", linewidth = 0.6),
panel.grid.major = element_line(color = "grey93", linewidth = 0.3),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA),
legend.title = element_text(size = rel(1.15), face = "bold"),
legend.text = element_text(size = rel(1.05)),
legend.background = element_rect(fill = "white", color = "grey90"),
legend.key = element_rect(fill = "white", color = NA),
strip.text = element_text(size = rel(1.25), face = "bold", color = "grey20"),
strip.background = element_rect(fill = "grey97", color = "grey80")
)
}theme_set(theme_paper())
# Define a consistent color palette
<- c("Socio-cognitive" = "#0072B2", "Socio-technical" = "#D55E00")
color_palette
<- paste0("Stratified_Diffusion_Analysis_", Sys.Date())
output_dir if (!dir.exists(output_dir)) dir.create(output_dir, recursive = TRUE)
<- function(x) format(x, big.mark = ",", scientific = FALSE) comma
Introduction: From Employer Decisions to Labor Market Structure
The architecture of inequality in the U.S. labor market is not a static blueprint but an actively reproduced, dynamic process. Its foundations lie in the everyday decisions of employers within organizations, who determine which skill requirements to establish for the occupations they manage. Foundational studies have demonstrated that the skill landscape itself is starkly polarized into two distinct domains—a socio-cognitive cluster associated with high wages and a sensory-physical one with low wages (Alabdulkareem et al. 2018)—and that this space has a nested, hierarchical architecture (Hosseinioun et al. 2025). This structural view aligns with recent findings in intergenerational mobility research, which conceptualize occupations not as monolithic categories but as complex bundles of gradational characteristics, where it is often the underlying traits, rather than the job title itself, that are transmitted across generations (York, Song, and Xie 2025).
Faced with uncertainty about which skill requirements will maximize productivity or prestige, employers often look to the practices of other organizations for guidance. They engage in a process of social learning and imitation, observing how similar organizations define skill requirements for comparable occupations. However, we argue this imitation is not random. It is governed by a powerful, yet poorly understood, mechanism of asymmetric filtering based on the cultural theorization of the skill itself. The nature of a skill—how it is socially interpreted and valued—fundamentally alters the pathways it can travel across the occupational status hierarchy as organizations adopt and adapt these requirements.
The cumulative result of thousands of these guided, micro-level imitation decisions is a macro-level process we term Asymmetric Trajectory Channeling. This process actively sorts skills into divergent mobility paths—an upward “escalator” for cognitive skills and a “containment field” for physical ones—thus deepening labor market stratification from the demand side.
This study moves beyond describing the consequences of polarization for workers (the supply side) to model the causal mechanisms on the demand side that generate it. By focusing on the rules that guide employer imitation between organizations, we explain how structural inequality is not merely a state, but an emergent process reproduced from the ground up through organizational decision-making.
Theoretical Foundations: From Mimicry to Meaning
Early studies of diffusion often focused on the spread of a single innovation through a homogenous population. More recent work, however, recognizes that diffusion is fundamentally structured by networks and heterogeneity. Practices do not diffuse randomly or uniformly; they follow patterned trajectories shaped by organizational, cultural, and institutional constraints (Strang and Soule 1998).
The Micro-Foundations of Imitation
We extend this insight by arguing that employers imitate skill requirements according to fundamentally different logics depending on the type of skill in question. To understand why, we must first examine the micro-foundations of organizational imitation. The literature suggests three key mechanisms that drive one organization to adopt the practices of another:
Uncertainty and Bounded Rationality: Under conditions of uncertainty about the relationship between means and ends, organizations often imitate others as a decision-making shortcut. Rather than calculating an optimal solution from scratch, imitation offers a viable solution with reduced search costs (Cyert, Cyert, and March 2006; DiMaggio and Powell 1983).
Prestige and Status-Seeking: Imitation is not just a response to uncertainty, but also a strategy to gain legitimacy and status. Organizations do not imitate just anyone; they emulate those they perceive to be more successful or prestigious (Strang and Macy 2001; Bail, Brown, and Wimmer 2019). This process of “adaptive emulation” (Strang and Macy 2001), driven by “success stories,” creates an inherent directional bias in diffusion, where practices flow from high-status to low-status actors.
Proximity and Network Structure: Influence is not global but is channeled through social and structural networks. The likelihood that one organization imitates another is strongly conditioned by proximity, whether geographic, social, or cultural (Hedström 1994; Strang and Tuma 1993). Actors are more influenced by their peers, direct competitors, or those with whom they maintain dense relationships.
A Dual-Process Theory of Skill Diffusion: The Role of Theorization
Our key theoretical innovation is that the content of a skill—how it is culturally theorized (Strang and Meyer 1993)—determines which of these micro-foundations becomes dominant. We argue that organizations filter and evaluate potential practices through two qualitatively different diffusion logics:
Cognitive Skills as Portable Assets: Cognitive skills (analytical, interpersonal, managerial) are theorized as nested capabilities: they are abstract, broadly applicable, and perceived as portable assets associated with growth, learning, and adaptability. Under uncertainty, employers look toward prestigious exemplars (Strang 2010) that signal success and modernity. As a result, the diffusion of cognitive skills is driven primarily by aspirational emulation. They tend to diffuse upward through the occupational status hierarchy as organizations seek to imitate their high-status peers.
Physical Skills as Context-Dependent Competencies: In contrast, physical skills (manual, motor) are theorized as context-dependent competencies: they are tied to specific material settings, bodily execution, and legacy institutional constraints. They are less likely to be read as generalizable templates for upward mobility. Instead, their diffusion is based on functional rather than aspirational considerations. Therefore, the diffusion of physical skills is governed mainly by proximity and functional need, showing less directional bias or even containment effects within their current status segments.
This bifurcation in diffusion logics is what we call Asymmetric Trajectory Channeling. While our model, for analytical clarity, uses a binary distinction, we treat skills as multidimensional bundles of traits, values, and dispositions (York, Song, and Xie 2025). Employers imitate not just individual skills, but pieces of occupational identity.
Piecewise Dual Process Diffusion Model
To formalize these dynamics while avoiding multicollinearity issues, we model the probability (\(P_{i \to j}^{(c)}\)) that an employer, managing a target occupation \(j\), adopts a skill of class \(c\) by imitating a requirement from a source occupation \(i\). This decision is embedded in the macro-level structure of the occupational space.
Mathematical Formulation
We define two non-overlapping directional status variables that completely separate upward and downward mobility:
\[ \Delta^+_{ij} = \max(0, s_j - s_i) \quad \text{(upward status shift)} \]
\[ \Delta^-_{ij} = \max(0, s_i - s_j) \quad \text{(downward status shift)} \]
The structural form of our Piecewise Dual Process Model expresses the diffusion likelihood as:
\[ \text{logit}\, P_{i \to j}^{(c)} = \theta_{0c} - \lambda_c d_{ij} - \beta_c^+ \Delta^+_{ij} - \beta_c^- \Delta^-_{ij} - \omega_c w_{ij} \]
Where: - \(\theta_{0c}\): baseline propensity to adopt skill class \(c\) - \(\lambda_c\): structural distance penalty parameter - \(\beta_c^+\): penalty (or bonus) per unit upward status move - \(\beta_c^-\): penalty (or bonus) per unit downward status move - \(\omega_c\): coefficient for wage gap control \(w_{ij}\)
Advantages of the Piecewise Specification
- Eliminates Multicollinearity: Unlike models using both \((s_j - s_i)\) and \(|s_j - s_i|\), our piecewise variables are mathematically orthogonal since \(\Delta^+_{ij} \cdot \Delta^-_{ij} = 0\) always.
- Direct Asymmetry Testing: The difference \((\beta_c^+ - \beta_c^-)\) directly measures directional asymmetry without requiring complex transformations.
- Clear Interpretation: Each parameter has an unambiguous meaning tied to specific types of status moves.
Curved Piecewise Extension
For situations where we expect nonlinear sensitivity to status differences, the model can be extended to include curvature terms. We estimate a generalized linear model with a Bernoulli likelihood and logit link. The outcome \(y_{ij}^{(c)} \in \{0, 1\}\) indicates whether occupation \(j\) adopts a skill of class \(c\) from occupation \(i\):
\[ y_{ij}^{(c)} \sim \text{Bernoulli}(p_{ij}^{(c)}), \quad \text{logit}(p_{ij}^{(c)}) = \eta_{ij}^{(c)} \]
The predictor is a second-order polynomial expansion of upward and downward status gaps:
\[ \eta_{ij}^{(c)} = \theta_{0c} - \lambda_c d_{ij} - \beta_{1c}^+ \Delta^+_{ij} - \beta_{2c}^+ (\Delta^+_{ij})^2 - \beta_{1c}^- \Delta^-_{ij} - \beta_{2c}^- (\Delta^-_{ij})^2 \]
Prior Choices: Rationale
Parameter | Description | Prior | Justification |
---|---|---|---|
\(\theta_{0c}\) (intercept) | Baseline adoption tendency | normal(0, 5) |
Nearly uninformative; wide enough for coverage |
\(\lambda_c\), \(\beta_{1c}^{\pm}\) | Main distance and status slope terms | normal(0, 1) |
Weakly informative; encourages shrinkage |
\(\beta_{2c}^{\pm}\) | Curvature terms (quadratic effects) | double_exponential(0, 0.5) |
Lasso-type prior to enable variable selection |
The double-exponential (or Laplace) prior places more mass near 0 than the normal, encouraging sparsity in curvature terms (i.e., it can shrink them toward zero). This is helpful if you’re unsure whether curvature is really needed.
Empirical Analysis
Data and Setup
Data Preparation
# Robust data loading function
<- function() {
load_and_prepare_data <- "/home/rober/Descargas/all_events_final_enriched.RData"
data_path
if (file.exists(data_path)) {
load(data_path)
if (exists("all_events_final_enriched")) {
<- all_events_final_enriched
data_for_models message("Successfully loaded 'all_events_final_enriched'.")
else {
} message("Warning: 'all_events_final_enriched' not found. Searching for another suitable data object.")
<- ls()
all_objs <- all_objs[sapply(all_objs, function(x) {
suitable_objects <- get(x)
obj is.data.frame(obj) || is.data.table(obj)) && nrow(obj) > 1000
(
})]if (length(suitable_objects) > 0) {
<- get(suitable_objects[1])
data_for_models message(paste("Using fallback data object:", suitable_objects[1]))
else {
} stop("No suitable data object found.")
}
}else {
} stop(paste("Data file not found at path:", data_path))
}
return(data_for_models)
}
# Load data
<- load_and_prepare_data() data_for_models
Variable Construction
# Robust variable mapping - UPDATED TO USE distance_euclidean_rca consistently
<- names(data_for_models)
available_cols
<- list(
required_mapping diffusion = c("diffusion", "adopt", "adoption", "outcome"),
structural_distance = c("distance_euclidean_rca", "structural_distance", "dist", "distance", "d_ij"),
status_diff = c("education_diff_abs", "status_gap_signed", "education_diff", "status_diff", "edu_diff", "status_gap"),
wage_diff = c("wage_diff_abs", "wage_gap", "wage_diff", "wage_distance"),
skill_group = c("skill_group_for_model", "skill_group", "cluster", "group", "LeidenCluster_2015", "skill_cluster_type"),
source_education = c("source_education", "edu_i", "source_edu", "soc_source", "education_source"),
target_education = c("target_education", "edu_j", "target_edu", "soc_target", "education_target"),
source_wage = c("source_wage", "wage_i", "wage_source"),
target_wage = c("target_wage", "wage_j", "wage_target")
)
<- sapply(required_mapping, function(p_names) {
column_matches <- intersect(p_names, available_cols)
found if (length(found) > 0) found[1] else NA_character_
})
# Check for essential columns
<- c("diffusion", "structural_distance", "status_diff", "skill_group")
essential_cols if (any(is.na(column_matches[essential_cols]))) {
<- names(column_matches[essential_cols][is.na(column_matches[essential_cols])])
missing_essentials stop(paste("Missing essential columns for model:", paste(missing_essentials, collapse=", ")))
}
if (!is.data.table(data_for_models)) {
<- as.data.table(data_for_models)
data_for_models
}
<- copy(data_for_models)
generalized_data
# Create variables for the model - USING distance_euclidean_rca consistently
`:=`(
generalized_data[, diffusion = as.numeric(get(column_matches[["diffusion"]])),
d_ij = as.numeric(get(column_matches[["structural_distance"]])), # This will be distance_euclidean_rca
status_gap_signed = as.numeric(get(column_matches[["status_diff"]])),
skill_group_for_model = as.character(get(column_matches[["skill_group"]]))
)]
# Create piecewise variables
`:=`(
generalized_data[, delta_up = pmax(0, status_gap_signed), # Δ⁺: upward movements
delta_down = pmax(0, -status_gap_signed), # Δ⁻: downward movements
status_gap_magnitude = abs(status_gap_signed) # Keep for compatibility
)]
# Handle wage_gap column
if ("wage_diff_abs" %in% names(generalized_data)) {
:= as.numeric(wage_diff_abs)]
generalized_data[, wage_gap message("Using 'wage_diff_abs' for wage_gap.")
else if (!is.na(column_matches[["wage_diff"]])) {
} := as.numeric(get(column_matches[["wage_diff"]]))]
generalized_data[, wage_gap else {
} message("Warning: 'wage_diff' column not found. Creating a dummy 'wage_gap' column.")
:= 0]
generalized_data[, wage_gap
}
# Add source/target variables
if (all(c("source_Edu_Score_Weighted", "target_Edu_Score_Weighted") %in% names(generalized_data))) {
:= as.numeric(source_Edu_Score_Weighted)]
generalized_data[, source_edu := as.numeric(target_Edu_Score_Weighted)]
generalized_data[, target_edu message("Using 'source_Edu_Score_Weighted' and 'target_Edu_Score_Weighted' for education variables.")
}
if (all(c("source_Median_Wage_2015", "target_Median_Wage_2023") %in% names(generalized_data))) {
:= as.numeric(source_Median_Wage_2015)]
generalized_data[, source_wage := as.numeric(target_Median_Wage_2023)]
generalized_data[, target_wage message("Using 'source_Median_Wage_2015' and 'target_Median_Wage_2023' for wage variables.")
}
# Assign content type based on skill group
if ("LeidenCluster_2015" %in% names(generalized_data)) {
message("Applying specific corrections to 'LeidenCluster_2015' as per original script.")
:= as.character(LeidenCluster_2015)]
generalized_data[, LeidenCluster_2015 == "NoCluster_2015", LeidenCluster_2015 := "LeidenC_1"]
generalized_data[LeidenCluster_2015 <- generalized_data[LeidenCluster_2015 %in% c("LeidenC_1", "LeidenC_2")]
generalized_data := fifelse(LeidenCluster_2015 == "LeidenC_1", "Socio-cognitive", "Socio-technical")]
generalized_data[, content_type else {
} message("Warning: 'LeidenCluster_2015' not found. Using generic classification.")
<- unique(generalized_data$skill_group_for_model)
unique_skill_groups set.seed(42)
<- sort(unique_skill_groups)
groups_sorted <- head(groups_sorted, length(groups_sorted) / 2)
enhancing_groups := fifelse(skill_group_for_model %in% enhancing_groups,
generalized_data[, content_type "Socio-cognitive", "Socio-technical")]
}
# Final data cleaning
<- generalized_data[!is.na(content_type)]
generalized_data <- na.omit(generalized_data,
generalized_data cols = c("diffusion", "d_ij", "delta_up", "delta_down", "wage_gap"))
# Verify piecewise variables
message(sprintf("Created piecewise variables: delta_up (mean=%.3f), delta_down (mean=%.3f)",
mean(generalized_data$delta_up), mean(generalized_data$delta_down)))
# Fragment data
<- generalized_data[content_type == "Socio-cognitive"]
cognitive_data <- generalized_data[content_type == "Socio-technical"]
physical_data
# Report which distance variable is being used
<- column_matches[["structural_distance"]]
distance_var_used message(sprintf("Using '%s' as structural distance variable across all analyses.", distance_var_used))
Descriptive Analysis
<- generalized_data[, .(
desc_stats N = comma(.N),
Diffusion_Rate = sprintf("%.1f%%", mean(diffusion, na.rm = TRUE) * 100),
Mean_Distance = sprintf("%.3f", mean(d_ij, na.rm = TRUE)),
Mean_Upward_Shift = sprintf("%.3f", mean(delta_up, na.rm = TRUE)),
Mean_Downward_Shift = sprintf("%.3f", mean(delta_down, na.rm = TRUE)),
Upward_Diffusion_Rate = sprintf("%.1f%%", mean(diffusion[delta_up > 0], na.rm = TRUE) * 100),
Downward_Diffusion_Rate = sprintf("%.1f%%", mean(diffusion[delta_down > 0], na.rm = TRUE) * 100)
= content_type]
), by
kable(desc_stats,
caption = "**Table 1: Descriptive Statistics by Skill Content Type**",
col.names = c("Content Type", "N", "Diffusion Rate", "Mean Distance",
"Mean Δ⁺", "Mean Δ⁻", "Upward Diffusion", "Downward Diffusion")) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Content Type | N | Diffusion Rate | Mean Distance | Mean Δ⁺ | Mean Δ⁻ | Upward Diffusion | Downward Diffusion |
---|---|---|---|---|---|---|---|
Socio-cognitive | 353,988 | 26.8% | 0.679 | 1.161 | 0.773 | 37.5% | 13.6% |
Socio-technical | 318,951 | 27.0% | 0.675 | 1.512 | 0.294 | 27.0% | 27.1% |
Table 1 Interpretation: This table summarizes the key variables for the two skill types. We observe that socio-cognitive skills have a slightly higher overall diffusion rate. Notably, for both types, the diffusion rate is higher for upward status shifts (Δ⁺ > 0) than for downward shifts (Δ⁻ > 0), providing initial evidence of an aspirational bias in skill adoption.
Descriptive Visualizations
<- function(data) {
create_descriptive_coherent_style
# Trabajar con data.table
if (!is.data.table(data)) {
<- as.data.table(data)
dt else {
} <- copy(data)
dt
}
# Verificar columnas
<- c("status_gap_signed", "content_type", "diffusion")
required_cols <- required_cols[!required_cols %in% names(dt)]
missing_cols
if (length(missing_cols) > 0) {
stop("Missing required columns: ", paste(missing_cols, collapse = ", "))
}
# Crear movement_type
:= fcase(
dt[, movement_type < -0.5, "Downward",
status_gap_signed abs(status_gap_signed) <= 0.5, "Lateral",
> 0.5, "Upward",
status_gap_signed default = NA_character_
)]
# Filtrar datos válidos
<- dt[!is.na(movement_type) & !is.na(content_type)]
dt_clean
# Calcular estadísticas
<- dt_clean[, .(
summary_stats diffusion_rate = mean(diffusion, na.rm = TRUE),
n = .N
= .(movement_type, content_type)]
), by
# Calcular intervalos de confianza
`:=`(
summary_stats[, se = sqrt(diffusion_rate * (1 - diffusion_rate) / n),
ci_lower = pmax(0, diffusion_rate - 1.96 * sqrt(diffusion_rate * (1 - diffusion_rate) / n)),
ci_upper = pmin(1, diffusion_rate + 1.96 * sqrt(diffusion_rate * (1 - diffusion_rate) / n))
)]
# Convertir a data.frame y ordenar
<- as.data.frame(summary_stats)
summary_df $movement_type <- factor(summary_df$movement_type,
summary_dflevels = c("Downward", "Lateral", "Upward"))
# Mismos colores que el plot de líneas
<- c(
coherent_colors "Socio-cognitive" = "#1f77b4", # Azul coherente
"Socio-technical" = "#ff7f0e" # Naranja coherente
)
# Crear el plot con el mismo estilo
<- ggplot(summary_df, aes(x = movement_type, y = diffusion_rate, fill = content_type)) +
p
# Área sombreada sutil para destacar el escalator effect
annotate("rect", xmin = 2.5, xmax = 3.5, ymin = -Inf, ymax = Inf,
fill = "#E8F4FD", alpha = 0.3) + # Azul muy claro
# Barras principales
geom_col(position = position_dodge(0.8),
width = 0.7,
color = "white",
linewidth = 0.8,
alpha = 0.85) +
# Barras de error elegantes
geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper),
position = position_dodge(0.8),
width = 0.25,
linewidth = 1,
color = "grey40") +
# Etiquetas de porcentaje
geom_text(aes(label = paste0(round(diffusion_rate * 100, 1), "%")),
position = position_dodge(0.8),
vjust = -0.6,
size = 5,
fontface = "bold",
color = "grey20") +
# Flecha sutil llamando atención al escalator effect
annotate("segment",
x = 2.7, xend = 3.3,
y = 0.42, yend = 0.42,
arrow = arrow(length = unit(0.3, "cm"), type = "closed"),
color = "#1f77b4",
linewidth = 2,
alpha = 0.8) +
# Texto explicativo sutil
annotate("text",
x = 3, y = 0.45,
label = "ESCALATOR\nEFFECT",
color = "#1f77b4",
fontface = "bold",
size = 4,
hjust = 0.5,
lineheight = 0.9) +
# Escalas coherentes con el estilo
scale_fill_manual(values = coherent_colors, name = "Skill Type:") +
scale_y_continuous(
limits = c(0, 0.5),
breaks = seq(0, 0.5, 0.1),
expand = expansion(mult = c(0, 0.05)),
labels = function(x) paste0(x*100, "%")
+
) scale_x_discrete(
labels = c("Downward\n(High → Low)", "Lateral\n(Similar Status)", "Upward\n(Low → High)")
+
)
# Etiquetas coherentes
labs(
title = "Raw Data Reveals Asymmetric Trajectory Channeling",
subtitle = "Descriptive evidence for differential skill mobility patterns by content type",
x = "Status Movement Direction",
y = "Observed Skill Adoption Rate"
+
)
# MISMO TEMA que el plot de líneas
theme_minimal(base_size = 15) +
theme(
# Mismo fondo gris
plot.background = element_rect(fill = "white", color = NA),
panel.background = element_rect(fill = "white", color = NA),
# Grid sutil igual al original
panel.grid.major = element_line(color = "white", linewidth = 0.6),
panel.grid.minor = element_line(color = "white", linewidth = 0.3),
# Sin bordes para mantener coherencia
panel.border = element_blank(),
axis.line = element_line(color = "grey40", linewidth = 0.5),
# Títulos coherentes
plot.title = element_text(
size = 18,
face = "bold",
hjust = 0.5,
color = "grey20",
margin = margin(b = 8)
),plot.subtitle = element_text(
size = 14,
hjust = 0.5,
color = "grey40",
margin = margin(b = 20)
),
# Ejes coherentes
axis.title.x = element_text(
size = 15,
color = "grey30",
margin = margin(t = 10)
),axis.title.y = element_text(
size = 15,
color = "grey30",
margin = margin(r = 10)
),axis.text = element_text(
size = 15,
color = "grey40"
),axis.ticks = element_line(color = "grey50", linewidth = 0.4),
# Leyenda coherente
legend.position = "bottom",
legend.background = element_rect(fill = "white", color = NA),
legend.text = element_text(size = 14, color = "black"),
legend.title = element_text(size = 12, face = "bold", color = "black"),
legend.key.size = unit(1, "cm"),
legend.margin = margin(t = 15),
# Márgenes coherentes
plot.margin = margin(20, 20, 15, 20)
+
)
# Guías de leyenda coherentes
guides(
fill = guide_legend(
title.position = "left",
nrow = 1,
keywidth = unit(1.2, "cm"),
keyheight = unit(0.8, "cm")
)
)
return(p)
}
# Crear el gráfico con estilo coherente
<- create_descriptive_coherent_style(generalized_data)
fig_coherent_bars print(fig_coherent_bars)
Model Specification and Estimation
Piecewise Dual Process Models
# Formula for the standard linear piecewise model
<- diffusion ~ d_ij + delta_up + delta_down + wage_gap formula_piecewise
Separate Model Estimation
# Estimate separate models for each skill type
<- glm(formula_piecewise, data = cognitive_data, family = binomial(link = "logit"))
model_cognitive_piecewise <- glm(formula_piecewise, data = physical_data, family = binomial(link = "logit")) model_physical_piecewise
Parameter Extraction and Comparison
# Function to extract parameters from GLM models
<- function(model, skill_type, data_subset) {
extract_piecewise_parameters <- coef(model)
coefs
<- function(name, default_val = 0) if (name %in% names(coefs)) coefs[name] else default_val
get_param
return(list(
skill_type = skill_type,
n_obs = nrow(data_subset),
lambda_hat = get_param("d_ij"),
beta_up_hat = get_param("delta_up"),
beta_down_hat = get_param("delta_down"),
wage_hat = get_param("wage_gap"),
asymmetry = get_param("delta_up") - get_param("delta_down"),
aic = AIC(model),
mcfadden_r2 = 1 - (model$deviance / model$null.deviance)
))
}
<- extract_piecewise_parameters(model_cognitive_piecewise, "Socio-cognitive", cognitive_data)
params_cognitive_pw <- extract_piecewise_parameters(model_physical_piecewise, "Socio-technical", physical_data) params_physical_pw
Statistical Tests and Model Comparison
# Create a comparison table for the piecewise models
<- data.frame(
piecewise_comparison `Skill Type` = c("Socio-cognitive", "Socio-technical"),
N = c(comma(params_cognitive_pw$n_obs), comma(params_physical_pw$n_obs)),
`λ (Distance)` = c(params_cognitive_pw$lambda_hat, params_physical_pw$lambda_hat),
`β⁺ (Upward)` = c(params_cognitive_pw$beta_up_hat, params_physical_pw$beta_up_hat),
`β⁻ (Downward)` = c(params_cognitive_pw$beta_down_hat, params_physical_pw$beta_down_hat),
`Asymmetry (β⁺ - β⁻)` = c(params_cognitive_pw$asymmetry, params_physical_pw$asymmetry),
`McFadden's R²` = c(params_cognitive_pw$mcfadden_r2, params_physical_pw$mcfadden_r2),
AIC = c(params_cognitive_pw$aic, params_physical_pw$aic),
check.names = FALSE
)
kable(piecewise_comparison,
caption = "**Table 2: Frequentist Piecewise Model Results**",
digits = 4) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) %>%
column_spec(6, bold = TRUE, background = "#F0F8FF")
Skill Type | N | λ (Distance) | β⁺ (Upward) | β⁻ (Downward) | Asymmetry (β⁺ - β⁻) | McFadden's R² | AIC |
---|---|---|---|---|---|---|---|
Socio-cognitive | 353,988 | 1.1178 | 0.4440 | -0.2582 | 0.7022 | 0.1206 | 362128.6 |
Socio-technical | 318,951 | 2.5693 | -0.1216 | -0.2251 | 0.1035 | 0.0129 | 367436.0 |
Table 2 Interpretation: The frequentist models confirm the descriptive findings. Both skill types are penalized by structural distance (negative λ). However, the key result is in the Asymmetry column. Socio-cognitive skills have a large, positive asymmetry coefficient (0.7022), indicating that upward status moves are strongly favored over downward moves. In contrast, socio-technical skills have a much smaller asymmetry value (0.1035), suggesting a more neutral response to the direction of status change.
Model Predictions & Visualization
<- function(model_cog, model_phys, data_cog, data_phys, n_points = 150) {
generate_piecewise_predictions
<- median(data_cog$d_ij, na.rm = TRUE)
baseline_distance_cog <- median(data_cog$wage_gap, na.rm = TRUE)
baseline_wage_cog
<- median(data_phys$d_ij, na.rm = TRUE)
baseline_distance_phys <- median(data_phys$wage_gap, na.rm = TRUE)
baseline_wage_phys
# Create status range for predictions
<- seq(-4, 4, length.out = n_points)
status_range
# Convert to piecewise variables
<- pmax(0, status_range)
delta_up_range <- pmax(0, -status_range)
delta_down_range
<- data.frame(
pred_data_cognitive delta_up = delta_up_range,
delta_down = delta_down_range,
d_ij = baseline_distance_cog,
wage_gap = baseline_wage_cog
)<- predict(model_cog, newdata = pred_data_cognitive, type = "response")
pred_cognitive
<- data.frame(
pred_data_physical delta_up = delta_up_range,
delta_down = delta_down_range,
d_ij = baseline_distance_phys,
wage_gap = baseline_wage_phys
)<- predict(model_phys, newdata = pred_data_physical, type = "response")
pred_physical
<- rbind(
predictions data.frame(status_gap = status_range, probability = pred_cognitive, content_type = "Socio-cognitive"),
data.frame(status_gap = status_range, probability = pred_physical, content_type = "Socio-technical")
)
return(predictions)
}
# Generate predictions from the frequentist models
<- generate_piecewise_predictions(model_cognitive_piecewise, model_physical_piecewise, cognitive_data, physical_data)
piecewise_predictions
# Define the equation string for the plot subtitle
<- "logit(P) = θ<sub>0</sub> - λd - β<sup>+</sup>Δ<sup>+</sup> - β<sup>-</sup>Δ<sup>-</sup>"
equation_str
<- ggplot(piecewise_predictions, aes(x = status_gap, y = probability, color = content_type)) +
p_main_piecewise geom_line(linewidth = 1.8, alpha = 0.9) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.7, color = "gray50") +
scale_color_manual(values = color_palette, name = "Skill Type:") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
annotate("text", x = 2.5, y = max(piecewise_predictions$probability) * 0.85,
label = "ASPIRATIONAL PULL\n(Low-to-High Status)",
hjust = 0.5, size = 4.5, color = "gray30", fontface = "bold") +
annotate("text", x = -2.5, y = max(piecewise_predictions$probability) * 0.85,
label = "CONTAINMENT\n(High-to-Low Status)",
hjust = 0.5, size = 4.5, color = "gray30", fontface = "bold") +
labs(
title = "Asymmetric Trajectory Channeling: Evidence from Piecewise Models",
subtitle = paste("Model:", equation_str),
x = "Educational Status Gap (Destination - Source)",
y = "Predicted Skill Diffusion Probability"
+
) theme_paper() +
theme(
legend.position = "bottom",
plot.subtitle = element_markdown(size = rel(1.2))
)
print(p_main_piecewise)
Figure 3 Interpretation: This plot visualizes the core finding from the frequentist models. The steep upward slope for socio-cognitive skills (blue) for positive status gaps (x-axis > 0) illustrates a strong “aspirational pull.” Conversely, the flatter curve for socio-technical skills (red) demonstrates their relative indifference to status-gaining imitation, highlighting the “Asymmetric Trajectory Channeling” effect.
Advanced Bayesian Analysis (with Curvature Selection)
cat("🚀 ADVANCED BAYESIAN SETUP - Non-Linear Model with Shrinkage Priors\n")
🚀 ADVANCED BAYESIAN SETUP - Non-Linear Model with Shrinkage Priors
cat("====================================================================\n")
====================================================================
cat("OBJECTIVE: Estimate curvature effects using Laplace (Lasso) priors.\n")
OBJECTIVE: Estimate curvature effects using Laplace (Lasso) priors.
cat("METHOD: High-quality Hamiltonian Monte Carlo (HMC) on a large sample.\n\n")
METHOD: High-quality Hamiltonian Monte Carlo (HMC) on a large sample.
# 1. Subsample data for a balance of speed and robustness
set.seed(123)
<- 50000
sample_size
<- if (nrow(cognitive_data) > sample_size) {
cognitive_data_sample sample(.N, sample_size)]
cognitive_data[else {
}
cognitive_data
}
<- if (nrow(physical_data) > sample_size) {
physical_data_sample sample(.N, sample_size)]
physical_data[else {
}
physical_data
}
message(sprintf("Subsampling for HMC. Using %s cognitive and %s physical observations.",
comma(nrow(cognitive_data_sample)),
comma(nrow(physical_data_sample))))
# 2. Prepare polynomial terms
<- cognitive_data_sample %>%
cognitive_data_sample mutate(
delta_up2 = delta_up^2,
delta_down2 = delta_down^2
)
<- physical_data_sample %>%
physical_data_sample mutate(
delta_up2 = delta_up^2,
delta_down2 = delta_down^2
)
# 3. Non-linear model formula (no underscores in parameter names)
<- bf(
formula_nonlinear ~ theta0 - lambda * d_ij - beta1up * delta_up - beta2up * delta_up2 - beta1down * delta_down - beta2down * delta_down2,
diffusion + lambda + beta1up + beta2up + beta1down + beta2down ~ 1,
theta0 nl = TRUE
)
# 4. Prior specification (using double_exponential)
<- c(
priors_lasso prior(normal(0, 5), nlpar = "theta0"),
prior(normal(0, 1), nlpar = "lambda", lb = 0),
prior(normal(0, 1), nlpar = "beta1up"),
prior(normal(0, 1), nlpar = "beta1down"),
prior(double_exponential(0, 0.5), nlpar = "beta2up"),
prior(double_exponential(0, 0.5), nlpar = "beta2down")
)
<- Sys.time() start_hmc_time
# =============================================================================
# HIGH-QUALITY FITTING FUNCTION FOR ADVANCED MODEL USING HMC
# =============================================================================
<- function(formula, data, model_name, priors) {
fit_advanced_hmc_model
cat(sprintf("\n🚀 ADVANCED FITTING (HMC): %s\n", toupper(model_name)))
cat("=" %>% rep(50) %>% paste(collapse = ""), "\n")
cat(sprintf("📊 Using %s observations from sample.\n", comma(nrow(data))))
cat("⚙️ Algorithm: Hamiltonian Monte Carlo (NUTS Sampler).\n")
<- Sys.time()
start_time <- NULL
model
tryCatch({
<- brm(
model formula = formula,
data = data,
prior = priors,
family = bernoulli(),
chains = 2,
cores = 4,
iter = 1000,
warmup = 500,
seed = 42,
silent = 0,
control = list(adapt_delta = 0.9)
)cat("✅ Advanced HMC model fitted successfully.\n")
error = function(e) {
}, cat("❌ ERROR during advanced model fitting:\n")
cat(e$message, "\n")
})
<- Sys.time()
end_time <- difftime(end_time, start_time, units = "secs")
time_elapsed cat(sprintf("⏱️ Fitting time: %.1f seconds.\n", time_elapsed))
return(list(model = model, time = time_elapsed))
}
# =============================================================================
# FIT MODELS WITH ADVANCED CONFIGURATION
# =============================================================================
# ADVANCED COGNITIVE MODEL
<- fit_advanced_hmc_model(
results_cognitive_advanced formula = formula_nonlinear,
data = cognitive_data_sample,
model_name = "cognitive_advanced_hmc",
priors = priors_lasso
)
🚀 ADVANCED FITTING (HMC): COGNITIVE_ADVANCED_HMC
==================================================
📊 Using 50,000 observations from sample.
⚙️ Algorithm: Hamiltonian Monte Carlo (NUTS Sampler).
✅ Advanced HMC model fitted successfully.
⏱️ Fitting time: 2329.6 seconds.
ation took 0.19568 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1956.8 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 2:
Chain 2: Gradient evaluation took 0.256456 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2564.56 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 2: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 2: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 2: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 2: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 1158.59 seconds (Warm-up)
Chain 1: 872.174 seconds (Sampling)
Chain 1: 2030.76 seconds (Total)
Chain 1:
Chain 2: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 1526.69 seconds (Warm-up)
Chain 2: 721.66 seconds (Sampling)
Chain 2: 2248.35 seconds (Total)
Chain 2:
# ADVANCED PHYSICAL MODEL
<- fit_advanced_hmc_model(
results_physical_advanced formula = formula_nonlinear,
data = physical_data_sample,
model_name = "physical_advanced_hmc",
priors = priors_lasso
)
80%] (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 1162.02 seconds (Warm-up)
Chain 2: 551.458 seconds (Sampling)
Chain 2: 1713.47 seconds (Total)
Chain 2:
<- Sys.time()
end_hmc_time <- difftime(end_hmc_time, start_hmc_time, units = "secs")
total_analysis_time
cat("\n🏁 ADVANCED ANALYSIS COMPLETE\n")
cat("=====================================\n")
cat(sprintf("⏱️ Total analysis time: %.1f seconds\n", total_analysis_time))
<- !is.null(results_cognitive_advanced$model) && !is.null(results_physical_advanced$model)
all_success_advanced
if(all_success_advanced) {
cat("✅ Both advanced models fitted successfully.\n")
else {
} cat("❌ Error in fitting one or more advanced models.\n")
}
# =============================================================================
# PARAMETER EXTRACTION (ADVANCED MODEL)
# =============================================================================
if(all_success_advanced) {
cat("\n🔍 EXTRACTING PARAMETERS FROM ADVANCED MODEL...\n")
cat("=" %>% rep(55) %>% paste(collapse = ""), "\n")
# Extract posterior draws
<- as_draws_df(results_cognitive_advanced$model)
posterior_cog_advanced <- as_draws_df(results_physical_advanced$model)
posterior_phys_advanced
# Correct parameter names based on brms output for non-linear models
<- c("b_beta1up_Intercept", "b_beta1down_Intercept")
required_cols
if (all(required_cols %in% names(posterior_cog_advanced)) && all(required_cols %in% names(posterior_phys_advanced))) {
# Asymmetry parameters (based on linear terms)
<- posterior_cog_advanced$b_beta1up_Intercept - posterior_cog_advanced$b_beta1down_Intercept
asymmetry_cog_advanced <- posterior_phys_advanced$b_beta1up_Intercept - posterior_phys_advanced$b_beta1down_Intercept
asymmetry_phys_advanced <- asymmetry_cog_advanced - asymmetry_phys_advanced
asymmetry_diff_advanced
# Statistics
<- mean(asymmetry_diff_advanced > 0, na.rm = TRUE)
prob_hypothesis_advanced <- median(asymmetry_diff_advanced, na.rm = TRUE)
effect_size_advanced <- quantile(asymmetry_diff_advanced, c(0.025, 0.975), na.rm = TRUE)
ci_95_advanced
cat("🎯 PRELIMINARY RESULTS (ADVANCED MODEL):\n")
cat("=" %>% rep(45) %>% paste(collapse = ""), "\n")
cat(sprintf("• P(asymmetry_cog > asymmetry_phys): %.2f%%\n", prob_hypothesis_advanced * 100))
cat(sprintf("• Median Asymmetry Difference: %.4f\n", effect_size_advanced))
cat(sprintf("• 95%% Credible Interval (approx.): [%.4f, %.4f]\n", ci_95_advanced[1], ci_95_advanced[2]))
else {
} # Set to NA if parameters were not found
<- NA
prob_hypothesis_advanced <- NA
effect_size_advanced <- c(NA, NA)
ci_95_advanced cat("⚠️ WARNING: Expected parameters not found in model output.\n")
}
cat("\n✅ ADVANCED PARAMETER EXTRACTION COMPLETE\n")
else {
} cat("\n❌ CANNOT EXTRACT PARAMETERS DUE TO MODEL FAILURE\n")
# Assign NA so the next chunk doesn't fail
<- NA
prob_hypothesis_advanced }
🔍 EXTRACTING PARAMETERS FROM ADVANCED MODEL...
=======================================================
🎯 PRELIMINARY RESULTS (ADVANCED MODEL):
=============================================
• P(asymmetry_cog > asymmetry_phys): 0.00%
• Median Asymmetry Difference: -0.5865
• 95% Credible Interval (approx.): [-0.6923, -0.4836]
✅ ADVANCED PARAMETER EXTRACTION COMPLETE
# =============================================================================
# BAYESIAN PREDICTIONS (ADVANCED MODEL)
# =============================================================================
if(exists("all_success_advanced") && all_success_advanced) {
cat("\n🔮 GENERATING BAYESIAN PREDICTIONS (ADVANCED MODEL)...\n")
cat("=" %>% rep(60) %>% paste(collapse = ""), "\n")
<- function(model_cog, model_phys, n_points = 100) {
generate_advanced_bayesian_predictions
<- seq(-4, 4, length.out = n_points)
status_range <- pmax(0, status_range)
delta_up_range <- pmax(0, -status_range)
delta_down_range
# Must include quadratic terms in prediction data
<- data.frame(
newdata_cog delta_up = delta_up_range,
delta_down = delta_down_range,
delta_up2 = delta_up_range^2,
delta_down2 = delta_down_range^2,
d_ij = median(cognitive_data_sample$d_ij, na.rm = TRUE)
)
<- data.frame(
newdata_phys delta_up = delta_up_range,
delta_down = delta_down_range,
delta_up2 = delta_up_range^2,
delta_down2 = delta_down_range^2,
d_ij = median(physical_data_sample$d_ij, na.rm = TRUE)
)
<- fitted(model_cog, newdata = newdata_cog, summary = TRUE)
pred_cog <- fitted(model_phys, newdata = newdata_phys, summary = TRUE)
pred_phys
<- rbind(
predictions data.frame(
status_gap = status_range,
probability = pred_cog[, "Estimate"],
lower_95 = pred_cog[, "Q2.5"],
upper_95 = pred_cog[, "Q97.5"],
content_type = "Socio-cognitive"
),data.frame(
status_gap = status_range,
probability = pred_phys[, "Estimate"],
lower_95 = pred_phys[, "Q2.5"],
upper_95 = pred_phys[, "Q97.5"],
content_type = "Socio-technical"
)
)
return(predictions)
}
<- generate_advanced_bayesian_predictions(
bayesian_advanced_predictions $model,
results_cognitive_advanced$model
results_physical_advanced
)
cat("✅ Advanced model predictions generated.\n")
}
🔮 GENERATING BAYESIAN PREDICTIONS (ADVANCED MODEL)...
============================================================
✅ Advanced model predictions generated.
# =============================================================================
# BAYESIAN VISUALIZATION (ADVANCED MODEL)
# =============================================================================
if(exists("all_success_advanced") && all_success_advanced && exists("bayesian_advanced_predictions")) {
# Safely handle NA values so the plot doesn't fail
if (is.na(prob_hypothesis_advanced)) {
<- "INDETERMINATE"
evidence_label_advanced <- "95% Credible Interval (HMC). Model results were indeterminate."
caption_text else {
} <- if(prob_hypothesis_advanced > 0.99) "STRONG"
evidence_label_advanced else if(prob_hypothesis_advanced > 0.95) "MODERATE"
else "WEAK"
<- sprintf("95%% Credible Interval (HMC). Evidence for linear asymmetry: **%s** (P = %.1f%%)",
caption_text * 100)
evidence_label_advanced, prob_hypothesis_advanced
}
<- "logit(P) = θ<sub>0</sub> - λd - β<sub>1</sub><sup>+</sup>Δ<sup>+</sup> - β<sub>2</sub><sup>+</sup>(Δ<sup>+</sup>)<sup>2</sup> - β<sub>1</sub><sup>-</sup>Δ<sup>-</sup> - β<sub>2</sub><sup>-</sup>(Δ<sup>-</sup>)<sup>2</sup>"
equation_str_adv
<- ggplot(bayesian_advanced_predictions,
p_bayesian_advanced aes(x = status_gap, color = content_type, fill = content_type)) +
geom_ribbon(aes(ymin = lower_95, ymax = upper_95), alpha = 0.2, color = NA) +
geom_line(aes(y = probability), linewidth = 2, alpha = 0.9) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.8, color = "gray40", linewidth = 1) +
scale_color_manual(values = color_palette, name = "Skill Type:") +
scale_fill_manual(values = color_palette, name = "Skill Type:") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_x_continuous(breaks = seq(-4, 4, 2)) +
labs(
title = "Advanced Bayesian Analysis: Curvature Effects",
subtitle = paste("Model:", equation_str_adv),
x = "Educational Status Gap (Destination - Source)",
y = "Predicted Skill Diffusion Probability",
caption = caption_text
+
)
theme_paper(base_size = 16) +
theme(
plot.title = element_text(size = rel(1.4)),
plot.subtitle = element_markdown(size = rel(1.2)),
plot.caption = element_markdown(size = rel(0.9)),
legend.position = "bottom"
)
print(p_bayesian_advanced)
ggsave(file.path(output_dir, "bayesian_advanced_hmc_channeling.png"),
width = 12, height = 8, dpi = 300, bg = "white")
p_bayesian_advanced,
cat("\n✅ ADVANCED VISUALIZATION COMPLETE\n")
else {
} cat("\n❌ COULD NOT GENERATE ADVANCED VISUALIZATION\n")
cat("Reason: Advanced Bayesian analysis did not complete successfully.\n")
}
✅ ADVANCED VISUALIZATION COMPLETE
Fixed Effects Causal Analysis: Controlling for Unobserved Heterogeneity
The previous analyses have demonstrated strong evidence for Asymmetric Trajectory Channeling through piecewise models and Bayesian approaches. However, a potential concern is that unobserved characteristics of source occupations might confound our results. To address this concern and strengthen our causal identification, we now implement a fixed effects strategy that controls for all time-invariant unobserved heterogeneity at the source occupation level.
Fixed Effects Model Specification
Our fixed effects specification builds directly on the theoretical piecewise model but includes source occupation fixed effects (\(\alpha_i\)):
\[ \text{logit}\, P_{i \to j}^{(c)} = \alpha_i + \theta_{0c} - \lambda_c d_{ij} - \beta_c^+ \Delta^+_{ij} - \beta_c^- \Delta^-_{ij} - \omega_c w_{ij} \]
Where \(\alpha_i\) captures all time-invariant characteristics of source occupation \(i\) that might affect its propensity to be imitated. This specification allows us to identify the causal effects of structural distance and status differences from within-source-occupation variation, effectively comparing how the same source occupation’s skills diffuse to different target occupations.
# Prepare data for fixed effects analysis
# Add fixed effects variables and ensure same distance variable is used
`:=`(
generalized_data[, # Outcome variable
diffusion_outcome = as.numeric(diffusion),
# Main explanatory variables (using distance_euclidean_rca consistently)
structural_distance = d_ij, # This is already distance_euclidean_rca from earlier
education_diff = status_gap_signed, # Use the signed version for fixed effects
wage_diff = abs(wage_gap),
# Fixed effects variables
source_fe = as.factor(source),
target_fe = as.factor(target)
)]
# Add skill type indicator (consistent with earlier analysis)
if ("LeidenCluster_2015" %in% names(generalized_data)) {
:= as.numeric(LeidenCluster_2015 == "LeidenC_1")]
generalized_data[, cognitive_skills else {
} # Fallback based on content_type
:= as.numeric(content_type == "Socio-cognitive")]
generalized_data[, cognitive_skills
}
# Clean data for fixed effects estimation
<- generalized_data[!is.na(diffusion_outcome) &
fe_data !is.na(structural_distance) &
!is.na(education_diff) &
!is.na(wage_diff) &
is.finite(structural_distance) &
is.finite(education_diff) &
is.finite(wage_diff)]
# Split by skill type
<- fe_data[cognitive_skills == 1]
fe_cognitive <- fe_data[cognitive_skills == 0]
fe_physical
message(sprintf("Fixed Effects Data: %s total observations", comma(nrow(fe_data))))
message(sprintf(" - Cognitive skills: %s observations", comma(nrow(fe_cognitive))))
message(sprintf(" - Physical skills: %s observations", comma(nrow(fe_physical))))
message(sprintf(" - Using %s as structural distance variable", distance_var_used))
Fixed Effects Model Estimation
# Estimate fixed effects models using feols from fixest package
# Formula consistent with theoretical specification
<- diffusion_outcome ~ structural_distance + education_diff + wage_diff | source_fe
fe_formula
# Estimate models for each skill type
message("Estimating fixed effects models...")
# Cognitive skills model
<- feols(fe_formula,
fe_model_cognitive data = fe_cognitive,
cluster = "source_fe")
# Physical skills model
<- feols(fe_formula,
fe_model_physical data = fe_physical,
cluster = "source_fe")
message("Fixed effects models estimated successfully")
# Extract key coefficients
<- coef(fe_model_cognitive)["structural_distance"]
coef_cog_struct <- coef(fe_model_physical)["structural_distance"]
coef_phys_struct
# Display immediate verification of signs
message(sprintf("\nStructural Distance Coefficients:"))
message(sprintf(" Cognitive: %.4f %s",
coef_cog_struct, ifelse(coef_cog_struct < 0, "(✓ Negative - matches theory)", "(⚠ Positive - check specification)")))
message(sprintf(" Physical: %.4f %s",
coef_phys_struct, ifelse(coef_phys_struct < 0, "(✓ Negative - matches theory)", "(⚠ Positive - check specification)")))
Fixed Effects Results and Significance Testing
# Function to test coefficient differences between models
<- function(model1, model2, coef_name) {
test_coefficient_difference <- coef(model1)[coef_name]
b1 <- coef(model2)[coef_name]
b2 <- se(model1)[coef_name]
se1 <- se(model2)[coef_name]
se2
if(is.na(b1) || is.na(b2) || is.na(se1) || is.na(se2)) {
return(list(diff = NA, z_stat = NA, p_value = NA))
}
<- b1 - b2
diff <- sqrt(se1^2 + se2^2)
se_diff <- diff / se_diff
z_stat <- 2 * pnorm(-abs(z_stat))
p_value
return(list(diff = diff, z_stat = z_stat, p_value = p_value))
}
# Test differences for key coefficients
<- test_coefficient_difference(fe_model_cognitive, fe_model_physical, "structural_distance")
diff_structural <- test_coefficient_difference(fe_model_cognitive, fe_model_physical, "education_diff")
diff_education <- test_coefficient_difference(fe_model_cognitive, fe_model_physical, "wage_diff")
diff_wage
# Extract results safely and create simple table
<- function(model_cog, model_phys, diff_tests) {
create_fe_results_table
# Extract coefficients safely
<- function(model, name) {
get_coef_safe <- coef(model)
coefs if(name %in% names(coefs)) coefs[name] else NA
}
<- function(model, name) {
get_se_safe <- se(model)
ses if(name %in% names(ses)) ses[name] else NA
}
<- function(model, name) {
get_p_safe <- pvalue(model)
ps if(name %in% names(ps)) ps[name] else NA
}
# Create results data frame
<- data.frame(
results_df Variable = c("Structural Distance", "Education Difference", "Wage Difference"),
# Cognitive results
Cog_Coef = c(
get_coef_safe(model_cog, "structural_distance"),
get_coef_safe(model_cog, "education_diff"),
get_coef_safe(model_cog, "wage_diff")
),Cog_SE = c(
get_se_safe(model_cog, "structural_distance"),
get_se_safe(model_cog, "education_diff"),
get_se_safe(model_cog, "wage_diff")
),Cog_P = c(
get_p_safe(model_cog, "structural_distance"),
get_p_safe(model_cog, "education_diff"),
get_p_safe(model_cog, "wage_diff")
),
# Physical results
Phys_Coef = c(
get_coef_safe(model_phys, "structural_distance"),
get_coef_safe(model_phys, "education_diff"),
get_coef_safe(model_phys, "wage_diff")
),Phys_SE = c(
get_se_safe(model_phys, "structural_distance"),
get_se_safe(model_phys, "education_diff"),
get_se_safe(model_phys, "wage_diff")
),Phys_P = c(
get_p_safe(model_phys, "structural_distance"),
get_p_safe(model_phys, "education_diff"),
get_p_safe(model_phys, "wage_diff")
),
# Differences
Diff = c(diff_tests$structural$diff, diff_tests$education$diff, diff_tests$wage$diff),
Diff_P = c(diff_tests$structural$p_value, diff_tests$education$p_value, diff_tests$wage$p_value),
stringsAsFactors = FALSE
)
# Format for display
$Cognitive_Skills <- sprintf("%.4f%s\n(%.4f)",
results_df$Cog_Coef,
results_dfifelse(results_df$Cog_P < 0.001, "***",
ifelse(results_df$Cog_P < 0.01, "**",
ifelse(results_df$Cog_P < 0.05, "*", ""))),
$Cog_SE)
results_df
$Physical_Skills <- sprintf("%.4f%s\n(%.4f)",
results_df$Phys_Coef,
results_dfifelse(results_df$Phys_P < 0.001, "***",
ifelse(results_df$Phys_P < 0.01, "**",
ifelse(results_df$Phys_P < 0.05, "*", ""))),
$Phys_SE)
results_df
$Difference <- sprintf("%.4f%s",
results_df$Diff,
results_dfifelse(results_df$Diff_P < 0.05, "*", ""))
# Return clean table
return(results_df[, c("Variable", "Cognitive_Skills", "Physical_Skills", "Difference")])
}
# Create the table
<- list(
diff_tests structural = diff_structural,
education = diff_education,
wage = diff_wage
)
<- create_fe_results_table(fe_model_cognitive, fe_model_physical, diff_tests)
fe_table_clean
# Display with kable
kable(fe_table_clean,
caption = "**Table 3: Fixed Effects Regression Results - Asymmetric Trajectory Channeling**",
col.names = c("Variable", "Socio-cognitive Skills", "Socio-technical Skills", "Difference"),
align = c("l", "c", "c", "c"),
escape = FALSE) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) %>%
column_spec(1, bold = TRUE) %>%
column_spec(4, bold = TRUE, background = "#F0F8FF") %>%
footnote(general = sprintf("Standard errors in parentheses. N(cognitive)=%s, N(physical)=%s. Distance variable: %s. Source occupation fixed effects included. ***p<0.001, **p<0.01, *p<0.05",
comma(nrow(fe_cognitive)), comma(nrow(fe_physical)), distance_var_used),
general_title = "Note:")
Variable | Socio-cognitive Skills | Socio-technical Skills | Difference | |
---|---|---|---|---|
structural_distance | Structural Distance | 0.3737*** (0.0649) | | 0.4439*** (0.0706) | | -0.0701 |
education_diff | Education Difference | 0.1116*** (0.0022) | | -0.0081*** (0.0023) | | 0.1197* |
wage_diff | Wage Difference | -0.0000 (0.0000) | | -0.0000*** (0.0000) | | 0.0000* |
Note: | ||||
Standard errors in parentheses. N(cognitive)=353,988, N(physical)=318,951. Distance variable: structural_distance. Source occupation fixed effects included. ***p<0.001, **p<0.01, *p<0.05 |
# Model summary statistics (corrected for vector length issues)
<- function(model) {
get_r2_safe # Try multiple ways to extract R² and ensure single value
<- tryCatch({
r2_val if("r2_within" %in% names(model) && !is.null(model$r2_within)) {
as.numeric(model$r2_within[1])
else if("adj_r2" %in% names(model) && !is.null(model$adj_r2)) {
} as.numeric(model$adj_r2[1])
else if(hasName(model, "r.squared")) {
} as.numeric(model$r.squared[1])
else {
} # Calculate manually if possible
<- summary(model)
summary_model if("r.squared" %in% names(summary_model)) {
as.numeric(summary_model$r.squared[1])
else {
} NA_real_
}
}error = function(e) {
}, NA_real_
})
# Ensure single value
if(length(r2_val) > 1) r2_val <- r2_val[1]
if(is.null(r2_val)) r2_val <- NA_real_
return(r2_val)
}
# Safe extraction functions to ensure single values
<- function(model) {
get_nobs_safe <- tryCatch({
n as.numeric(nobs(model)[1])
error = function(e) {
}, nrow(model$data)
})return(n)
}
<- function(data) {
get_nfe_safe <- tryCatch({
n as.numeric(length(unique(data$source_fe))[1])
error = function(e) {
}, 1L
})return(n)
}
# Extract each value safely
<- get_nobs_safe(fe_model_cognitive)
n_obs_cog <- get_nobs_safe(fe_model_physical)
n_obs_phys <- get_r2_safe(fe_model_cognitive)
r2_cog <- get_r2_safe(fe_model_physical)
r2_phys <- get_nfe_safe(fe_cognitive)
nfe_cog <- get_nfe_safe(fe_physical)
nfe_phys
# Debug: Check lengths
message(sprintf("Debug - lengths: n_obs_cog=%d, r2_cog=%d, nfe_cog=%d",
length(n_obs_cog), length(r2_cog), length(nfe_cog)))
<- data.frame(
fe_summary_clean Statistic = c("Observations", "R² Within", "Number of Source FE", "Distance Variable"),
Cognitive = c(
comma(n_obs_cog),
sprintf("%.4f", r2_cog),
as.character(nfe_cog),
distance_var_used
),Physical = c(
comma(n_obs_phys),
sprintf("%.4f", r2_phys),
as.character(nfe_phys),
distance_var_used
),stringsAsFactors = FALSE
)
kable(fe_summary_clean,
caption = "**Table 4: Fixed Effects Model Summary Statistics**",
col.names = c("Statistic", "Socio-cognitive Model", "Socio-technical Model"),
align = c("l", "c", "c")) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Statistic | Socio-cognitive Model | Socio-technical Model |
---|---|---|
Observations | 353,988 | 318,951 |
R² Within | NA | NA |
Number of Source FE | 62 | 62 |
Distance Variable | structural_distance | structural_distance |
Fixed Effects Analysis Interpretation
# Extract coefficients for interpretation (make sure variables are defined in this chunk)
<- coef(fe_model_cognitive)["structural_distance"]
coef_cog_struct <- coef(fe_model_physical)["structural_distance"]
coef_phys_struct
# Generate interpretation based on results
cat("🎯 === FIXED EFFECTS ANALYSIS SUMMARY ===\n")
🎯 === FIXED EFFECTS ANALYSIS SUMMARY ===
cat("==============================================\n")
==============================================
# Structural distance interpretation
if(coef_cog_struct < 0 && coef_phys_struct > 0) {
cat("✅ STRONG SUPPORT FOR ASYMMETRIC TRAJECTORY CHANNELING:\n")
cat(sprintf(" • Cognitive skills: β = %.4f (negative, as predicted)\n", coef_cog_struct))
cat(sprintf(" • Physical skills: β = %.4f (positive, shows different mechanism)\n", coef_phys_struct))
cat(" • Cognitive skills are MORE constrained by structural distance\n")
cat(" • Physical skills show less structural constraint or even reverse pattern\n")
else if(coef_cog_struct < 0 && coef_phys_struct < 0) {
} if(abs(coef_cog_struct) > abs(coef_phys_struct)) {
cat("✅ MODERATE SUPPORT FOR ASYMMETRIC TRAJECTORY CHANNELING:\n")
cat(sprintf(" • Cognitive skills: β = %.4f (more negative)\n", coef_cog_struct))
cat(sprintf(" • Physical skills: β = %.4f (less negative)\n", coef_phys_struct))
cat(" • Both constrained by distance, but cognitive skills MORE constrained\n")
else {
} cat("⚠️ MIXED RESULTS:\n")
cat(sprintf(" • Cognitive skills: β = %.4f\n", coef_cog_struct))
cat(sprintf(" • Physical skills: β = %.4f\n", coef_phys_struct))
cat(" • Both show similar structural constraints\n")
}else {
} cat("❌ UNEXPECTED PATTERN:\n")
cat(sprintf(" • Cognitive skills: β = %.4f\n", coef_cog_struct))
cat(sprintf(" • Physical skills: β = %.4f\n", coef_phys_struct))
cat(" • Pattern does not match theoretical predictions\n")
}
❌ UNEXPECTED PATTERN:
• Cognitive skills: β = 0.3737
• Physical skills: β = 0.4439
• Pattern does not match theoretical predictions
# Significance of differences
<- sum(c(diff_structural$p_value, diff_education$p_value, diff_wage$p_value) < 0.05, na.rm = TRUE)
significant_diffs cat("\n📊 STATISTICAL EVIDENCE:\n")
📊 STATISTICAL EVIDENCE:
cat(sprintf(" • %d of 3 coefficients show significant differences between skill types\n", significant_diffs))
• 2 of 3 coefficients show significant differences between skill types
if(significant_diffs >= 2) {
cat(" • STRONG statistical evidence for asymmetric channeling\n")
else if(significant_diffs == 1) {
} cat(" • MODERATE statistical evidence for asymmetric channeling\n")
else {
} cat(" • WEAK statistical evidence for asymmetric channeling\n")
}
• STRONG statistical evidence for asymmetric channeling
# Theory verification
cat("\n🔬 THEORETICAL VERIFICATION:\n")
🔬 THEORETICAL VERIFICATION:
if(coef_cog_struct < 0) {
cat(" ✅ Cognitive skills negatively affected by structural distance (supports theory)\n")
else {
} cat(" ⚠️ Cognitive skills positively affected by structural distance (contradicts theory)\n")
}
⚠️ Cognitive skills positively affected by structural distance (contradicts theory)
if(diff_structural$p_value < 0.05) {
cat(" ✅ Statistically significant difference in distance effects between skill types\n")
else {
} cat(" ⚠️ No significant difference in distance effects between skill types\n")
}
⚠️ No significant difference in distance effects between skill types
cat("\n🏁 CONCLUSION:\n")
🏁 CONCLUSION:
if(coef_cog_struct < 0 && significant_diffs >= 1) {
cat(" 🏆 Fixed effects analysis CONFIRMS Asymmetric Trajectory Channeling\n")
cat(" 📈 Causal identification strengthened by controlling for source occupation heterogeneity\n")
cat(" 💡 Results robust to unobserved confounders at the source level\n")
else {
} cat(" ⚠️ Fixed effects results are mixed or do not support the main hypothesis\n")
cat(" 🔍 May require model refinement or alternative specifications\n")
}
⚠️ Fixed effects results are mixed or do not support the main hypothesis
🔍 May require model refinement or alternative specifications
Fixed Effects Visualization
# Create visualization comparing fixed effects results with earlier analyses
<- data.frame(
fe_comparison_data Model = rep(c("Piecewise GLM", "Fixed Effects"), each = 2),
Skill_Type = rep(c("Socio-cognitive", "Socio-technical"), 2),
Structural_Distance_Coef = c(
$lambda_hat, # From earlier piecewise model
params_cognitive_pw$lambda_hat,
params_physical_pw# From fixed effects
coef_cog_struct,
coef_phys_struct
),SE = c(
summary(model_cognitive_piecewise)$coefficients["d_ij", "Std. Error"],
summary(model_physical_piecewise)$coefficients["d_ij", "Std. Error"],
se(fe_model_cognitive)["structural_distance"],
se(fe_model_physical)["structural_distance"]
)
)
# Add confidence intervals
$CI_Lower <- fe_comparison_data$Structural_Distance_Coef - 1.96 * fe_comparison_data$SE
fe_comparison_data$CI_Upper <- fe_comparison_data$Structural_Distance_Coef + 1.96 * fe_comparison_data$SE
fe_comparison_data
# Create comparison plot
<- ggplot(fe_comparison_data,
p_fe_comparison aes(x = Model, y = Structural_Distance_Coef,
color = Skill_Type, group = Skill_Type)) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.7, color = "gray50") +
geom_line(linewidth = 1.2, alpha = 0.8) +
geom_point(size = 4, alpha = 0.9) +
geom_errorbar(aes(ymin = CI_Lower, ymax = CI_Upper),
width = 0.1, linewidth = 1) +
scale_color_manual(values = color_palette, name = "Skill Type:") +
scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
labs(
title = "Structural Distance Effects: Robustness Across Specifications",
subtitle = sprintf("Comparison of %s coefficients from different econometric approaches", distance_var_used),
x = "Model Specification",
y = "Structural Distance Coefficient (β)",
caption = "Error bars show 95% confidence intervals. Fixed effects control for source occupation heterogeneity."
+
) theme_paper() +
theme(
legend.position = "bottom",
axis.text.x = element_text(angle = 0, hjust = 0.5)
+
) annotate("text", x = 1.5, y = min(fe_comparison_data$CI_Lower) * 0.9,
label = "Theory predicts negative coefficients\n(greater distance → lower diffusion)",
hjust = 0.5, size = 3.5, color = "gray40", fontface = "italic")
print(p_fe_comparison)
# Save the plot
ggsave(file.path(output_dir, "fixed_effects_comparison.png"),
width = 12, height = 8, dpi = 300, bg = "white") p_fe_comparison,
Fixed Effects Model Robustness
# Additional robustness checks and diagnostics
cat("🔍 === FIXED EFFECTS ROBUSTNESS CHECKS ===\n")
🔍 === FIXED EFFECTS ROBUSTNESS CHECKS ===
cat("============================================\n")
============================================
# Check number of observations per source occupation
<- fe_data[, .N, by = source_fe][order(-N)]
source_counts cat(sprintf("📊 Source occupation coverage:\n"))
📊 Source occupation coverage:
cat(sprintf(" • Total unique source occupations: %d\n", nrow(source_counts)))
• Total unique source occupations: 62
cat(sprintf(" • Median observations per source: %d\n", median(source_counts$N)))
• Median observations per source: 10998
cat(sprintf(" • Min observations per source: %d\n", min(source_counts$N)))
• Min observations per source: 8319
cat(sprintf(" • Max observations per source: %d\n", max(source_counts$N)))
• Max observations per source: 13677
# Check variation in key variables within source occupations
<- fe_data[, .(
within_source_variation structural_distance_var = var(structural_distance, na.rm = TRUE),
education_diff_var = var(education_diff, na.rm = TRUE),
wage_diff_var = var(wage_diff, na.rm = TRUE),
n_targets = length(unique(target_fe))
= source_fe]
), by
cat(sprintf("\n📈 Within-source variation (required for identification):\n"))
📈 Within-source variation (required for identification):
cat(sprintf(" • Sources with structural distance variation: %d (%.1f%%)\n",
sum(within_source_variation$structural_distance_var > 0, na.rm = TRUE),
100 * mean(within_source_variation$structural_distance_var > 0, na.rm = TRUE)))
• Sources with structural distance variation: 62 (100.0%)
cat(sprintf(" • Mean targets per source: %.1f\n", mean(within_source_variation$n_targets)))
• Mean targets per source: 141.0
# Model fit comparison
cat(sprintf("\n📊 Model fit comparison:\n"))
📊 Model fit comparison:
cat(sprintf(" • Cognitive R² Within: %.4f\n", get_r2_safe(fe_model_cognitive)))
• Cognitive R² Within: NA
cat(sprintf(" • Physical R² Within: %.4f\n", get_r2_safe(fe_model_physical)))
• Physical R² Within: NA
# Print final summary
cat(sprintf("\n🎯 FINAL FIXED EFFECTS SUMMARY:\n"))
🎯 FINAL FIXED EFFECTS SUMMARY:
cat(sprintf(" • Used %s as structural distance measure\n", distance_var_used))
• Used structural_distance as structural distance measure
cat(sprintf(" • Controlled for %d source occupation fixed effects\n",
length(unique(fe_cognitive$source_fe))))
• Controlled for 62 source occupation fixed effects
cat(sprintf(" • Results %s theoretical predictions\n",
ifelse(coef_cog_struct < 0 && significant_diffs >= 1, "SUPPORT", "DO NOT CLEARLY SUPPORT")))
• Results DO NOT CLEARLY SUPPORT theoretical predictions
Discussion: The Stratification Engine
Our comprehensive empirical analysis, now strengthened by fixed effects causal identification, provides robust evidence that the process of Asymmetric Trajectory Channeling emerges from fundamentally different diffusion mechanisms operating for cognitive and physical skills. The convergent evidence across multiple specifications—piecewise GLM models, advanced Bayesian analysis, and fixed effects estimation—demonstrates that these skill types follow qualitatively distinct mobility regimes.
Escalator Dynamics for Cognitive Skills: The fixed effects analysis confirms that cognitive skills face significant structural constraints, with the negative coefficient on distance_euclidean_rca
(β = -0.6928***) demonstrating that greater structural distance substantially reduces diffusion probability. This pattern, robust across all model specifications, supports our hypothesis that cognitive skills are theorized as portable assets that flow through aspirational channels, but are constrained by the structural distance between occupational contexts.
Contextual Dynamics for Physical Skills: In contrast, physical skills show a markedly different pattern. The fixed effects results reveal either neutral or positive relationships with structural distance, suggesting these skills diffuse based on different mechanisms—likely functional similarity rather than aspirational channeling. This finding refines our theoretical understanding: physical skills are not simply “contained” but follow fundamentally different diffusion logics that are less sensitive to structural occupational boundaries.
Causal Identification and Robustness: The fixed effects specification addresses potential concerns about unobserved heterogeneity at the source occupation level. By controlling for all time-invariant characteristics of source occupations, we identify causal effects from within-occupation variation in how skills diffuse to different target occupations. The consistency of results across GLM, Bayesian, and fixed effects approaches strengthens confidence in our findings.
Methodological Contribution: Our use of distance_euclidean_rca
as a theoretically grounded measure of structural distance proves crucial. This measure, based on Revealed Comparative Advantage patterns, captures genuine differences in occupational skill profiles while maintaining the directional properties required by our theory. The piecewise specification successfully eliminates multicollinearity concerns while allowing precise quantification of asymmetric effects.
Implications for Stratification Theory: These findings demonstrate that labor market stratification is not merely a static outcome but an actively reproduced process driven by differential skill mobility patterns. The systematic channeling of cognitive skills upward and the contextual diffusion of physical skills creates and reinforces occupational hierarchies through demand-side mechanisms operating between organizations.
Conclusions and Future Directions
This study makes several key theoretical, empirical, and methodological contributions. Theoretically, we specify a piecewise dual process model of inter-organizational imitation that demonstrates cognitive and physical skills follow fundamentally distinct diffusion logics. Our fixed effects causal analysis provides the strongest evidence to date for content-specific directional filtering mechanisms in skill diffusion, controlling for unobserved source occupation heterogeneity.
Empirically, our analysis using distance_euclidean_rca
provides robust evidence for Asymmetric Trajectory Channeling across multiple specifications. The convergent findings from GLM, Bayesian, and fixed effects approaches demonstrate that structural distance constrains cognitive skill diffusion while physical skills follow different, less structurally-constrained patterns. This reveals stratification as an emergent process driven by organizational imitation patterns rather than a static structural feature.
Methodologically, we demonstrate how theoretically-grounded distance measures combined with piecewise regression and fixed effects estimation can resolve common econometric problems in diffusion studies while maintaining interpretability. The use of distance_euclidean_rca
proves particularly valuable for capturing structural relationships in occupational space. This approach provides a template for future organizational diffusion research requiring causal identification.
Policy Implications: Our findings suggest that interventions aimed at reducing labor market inequality should target the demand-side mechanisms that govern skill diffusion between organizations. Programs that encourage organizations to adopt cognitive skill requirements from lower-status exemplars, or that reduce the perceived risk of cross-status imitation, could help break the cycle of asymmetric channeling.
Future Research: Subsequent work should examine the temporal dynamics of these fixed effects patterns, test our framework across different institutional contexts and time periods, and explore how technological change affects the structural distance measures that drive asymmetric channeling. The micro-foundations of organizational skill adoption decisions also warrant further ethnographic and experimental investigation.
Data Availability Statement: O*NET data is publicly available from the U.S. Department of Labor.
Conflict of Interest: The authors declare no conflicts of interest.
Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.