suppressPackageStartupMessages({
library(kableExtra)
library(data.table)
library(ggplot2)
library(broom)
library(dplyr)
library(scales)
library(lmtest)
library(knitr)
library(patchwork)
library(tidyr)
library(ggrepel)
library(viridis)
library(RColorBrewer)
library(ggtext)
library(brms)
})
# Custom theme for a professional, academic look
<- function(base_size = 15, base_family = "sans") {
theme_paper theme_minimal(base_size = base_size, base_family = base_family) +
theme(
text = element_text(color = "grey20"),
plot.title = element_text(size = rel(1.6), face = "bold", hjust = 0.5, margin = margin(b = 15), family = base_family),
plot.subtitle = element_markdown(size = rel(1.1), color = "grey40", hjust = 0.5, margin = margin(b = 15), lineheight = 1.2),
plot.caption = element_text(size = rel(0.9), color = "grey50", hjust = 1, margin = margin(t = 12)),
axis.title = element_text(size = rel(1.25), face = "bold", color = "grey30"),
axis.text = element_text(size = rel(1.1), color = "grey40"),
axis.line = element_line(color = "grey80", linewidth = 0.6),
panel.grid.major = element_line(color = "grey93", linewidth = 0.3),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "white", color = NA),
legend.title = element_text(size = rel(1.15), face = "bold"),
legend.text = element_text(size = rel(1.05)),
legend.background = element_rect(fill = "white", color = "grey90"),
legend.key = element_rect(fill = "white", color = NA),
strip.text = element_text(size = rel(1.25), face = "bold", color = "grey20"),
strip.background = element_rect(fill = "grey97", color = "grey80")
)
}theme_set(theme_paper())
# Define a consistent color palette
<- c("Socio-cognitive" = "#0072B2", "Socio-technical" = "#D55E00")
color_palette
<- paste0("Stratified_Diffusion_Analysis_", Sys.Date())
output_dir if (!dir.exists(output_dir)) dir.create(output_dir, recursive = TRUE)
<- function(x) format(x, big.mark = ",", scientific = FALSE) comma
Introduction: From Employer Decisions to Labor Market Structure
The architecture of inequality in the U.S. labor market is not a static blueprint but an actively reproduced, dynamic process. Its foundations lie in the everyday decisions of employers within organizations, who determine which skill requirements to establish for the occupations they manage. Foundational studies have demonstrated that the skill landscape itself is starkly polarized into two distinct domains—a socio-cognitive cluster associated with high wages and a sensory-physical one with low wages (Alabdulkareem et al. 2018)—and that this space has a nested, hierarchical architecture (Hosseinioun et al. 2025). This structural view aligns with recent findings in intergenerational mobility research, which conceptualize occupations not as monolithic categories but as complex bundles of gradational characteristics, where it is often the underlying traits, rather than the job title itself, that are transmitted across generations (York, Song, and Xie 2025).
Faced with uncertainty about which skill requirements will maximize productivity or prestige, employers often look to the practices of other organizations for guidance. They engage in a process of social learning and imitation, observing how similar organizations define skill requirements for comparable occupations. However, we argue this imitation is not random. It is governed by a powerful, yet poorly understood, mechanism of asymmetric filtering based on the cultural theorization of the skill itself. The nature of a skill—how it is socially interpreted and valued—fundamentally alters the pathways it can travel across the occupational status hierarchy as organizations adopt and adapt these requirements.
The cumulative result of thousands of these guided, micro-level imitation decisions is a macro-level process we term Asymmetric Trajectory Channeling. This process actively sorts skills into divergent mobility paths—an upward “escalator” for cognitive skills and a “containment field” for physical ones—thus deepening labor market stratification from the demand side.
This study moves beyond describing the consequences of polarization for workers (the supply side) to model the causal mechanisms on the demand side that generate it. By focusing on the rules that guide employer imitation between organizations, we explain how structural inequality is not merely a state, but an emergent process reproduced from the ground up through organizational decision-making.
Theoretical Foundations: From Mimicry to Meaning
Early studies of diffusion often focused on the spread of a single innovation through a homogenous population. More recent work, however, recognizes that diffusion is fundamentally structured by networks and heterogeneity. Practices do not diffuse randomly or uniformly; they follow patterned trajectories shaped by organizational, cultural, and institutional constraints (Strang and Soule 1998).
The Micro-Foundations of Imitation
We extend this insight by arguing that employers imitate skill requirements according to fundamentally different logics depending on the type of skill in question. To understand why, we must first examine the micro-foundations of organizational imitation. The literature suggests three key mechanisms that drive one organization to adopt the practices of another:
Uncertainty and Bounded Rationality: Under conditions of uncertainty about the relationship between means and ends, organizations often imitate others as a decision-making shortcut. Rather than calculating an optimal solution from scratch, imitation offers a viable solution with reduced search costs (Cyert, Cyert, and March 2006; DiMaggio and Powell 1983).
Prestige and Status-Seeking: Imitation is not just a response to uncertainty, but also a strategy to gain legitimacy and status. Organizations do not imitate just anyone; they emulate those they perceive to be more successful or prestigious (Strang and Macy 2001; Bail, Brown, and Wimmer 2019). This process of “adaptive emulation” (Strang and Macy 2001), driven by “success stories,” creates an inherent directional bias in diffusion, where practices flow from high-status to low-status actors.
Proximity and Network Structure: Influence is not global but is channeled through social and structural networks. The likelihood that one organization imitates another is strongly conditioned by proximity, whether geographic, social, or cultural (Hedström 1994; Strang and Tuma 1993). Actors are more influenced by their peers, direct competitors, or those with whom they maintain dense relationships.
A Dual-Process Theory of Skill Diffusion: The Role of Theorization
Our key theoretical innovation is that the content of a skill—how it is culturally theorized (Strang and Meyer 1993)—determines which of these micro-foundations becomes dominant. We argue that organizations filter and evaluate potential practices through two qualitatively different diffusion logics:
Cognitive Skills as Portable Assets: Cognitive skills (analytical, interpersonal, managerial) are theorized as nested capabilities: they are abstract, broadly applicable, and perceived as portable assets associated with growth, learning, and adaptability. Under uncertainty, employers look toward prestigious exemplars (Strang 2010) that signal success and modernity. As a result, the diffusion of cognitive skills is driven primarily by aspirational emulation. They tend to diffuse upward through the occupational status hierarchy as organizations seek to imitate their high-status peers.
Physical Skills as Context-Dependent Competencies: In contrast, physical skills (manual, motor) are theorized as context-dependent competencies: they are tied to specific material settings, bodily execution, and legacy institutional constraints. They are less likely to be read as generalizable templates for upward mobility. Instead, their diffusion is based on functional rather than aspirational considerations. Therefore, the diffusion of physical skills is governed mainly by proximity and functional need, showing less directional bias or even containment effects within their current status segments.
This bifurcation in diffusion logics is what we call Asymmetric Trajectory Channeling. While our model, for analytical clarity, uses a binary distinction, we treat skills as multidimensional bundles of traits, values, and dispositions (York, Song, and Xie 2025). Employers imitate not just individual skills, but pieces of occupational identity.
Piecewise Dual Process Diffusion Model
To formalize these dynamics while avoiding multicollinearity issues, we model the probability (\(P_{i \to j}^{(c)}\)) that an employer, managing a target occupation \(j\), adopts a skill of class \(c\) by imitating a requirement from a source occupation \(i\). This decision is embedded in the macro-level structure of the occupational space.
Mathematical Formulation
We define two non-overlapping directional status variables that completely separate upward and downward mobility:
\[ \Delta^+_{ij} = \max(0, s_j - s_i) \quad \text{(upward status shift)} \]
\[ \Delta^-_{ij} = \max(0, s_i - s_j) \quad \text{(downward status shift)} \]
The structural form of our Piecewise Dual Process Model expresses the diffusion likelihood as:
\[ \text{logit}\, P_{i \to j}^{(c)} = \theta_{0c} - \lambda_c d_{ij} - \beta_c^+ \Delta^+_{ij} - \beta_c^- \Delta^-_{ij} - \omega_c w_{ij} \]
Where: - \(\theta_{0c}\): baseline propensity to adopt skill class \(c\) - \(\lambda_c\): structural distance penalty parameter - \(\beta_c^+\): penalty (or bonus) per unit upward status move - \(\beta_c^-\): penalty (or bonus) per unit downward status move - \(\omega_c\): coefficient for wage gap control \(w_{ij}\)
Advantages of the Piecewise Specification
- Eliminates Multicollinearity: Unlike models using both \((s_j - s_i)\) and \(|s_j - s_i|\), our piecewise variables are mathematically orthogonal since \(\Delta^+_{ij} \cdot \Delta^-_{ij} = 0\) always.
- Direct Asymmetry Testing: The difference \((\beta_c^+ - \beta_c^-)\) directly measures directional asymmetry without requiring complex transformations.
- Clear Interpretation: Each parameter has an unambiguous meaning tied to specific types of status moves.
Curved Piecewise Extension
For situations where we expect nonlinear sensitivity to status differences, the model can be extended to include curvature terms. We estimate a generalized linear model with a Bernoulli likelihood and logit link. The outcome \(y_{ij}^{(c)} \in \{0, 1\}\) indicates whether occupation \(j\) adopts a skill of class \(c\) from occupation \(i\):
\[ y_{ij}^{(c)} \sim \text{Bernoulli}(p_{ij}^{(c)}), \quad \text{logit}(p_{ij}^{(c)}) = \eta_{ij}^{(c)} \]
The predictor is a second-order polynomial expansion of upward and downward status gaps:
\[ \eta_{ij}^{(c)} = \theta_{0c} - \lambda_c d_{ij} - \beta_{1c}^+ \Delta^+_{ij} - \beta_{2c}^+ (\Delta^+_{ij})^2 - \beta_{1c}^- \Delta^-_{ij} - \beta_{2c}^- (\Delta^-_{ij})^2 \]
Prior Choices: Rationale
Parameter | Description | Prior | Justification |
---|---|---|---|
\(\theta_{0c}\) (intercept) | Baseline adoption tendency | normal(0, 5) |
Nearly uninformative; wide enough for coverage |
\(\lambda_c\), \(\beta_{1c}^{\pm}\) | Main distance and status slope terms | normal(0, 1) |
Weakly informative; encourages shrinkage |
\(\beta_{2c}^{\pm}\) | Curvature terms (quadratic effects) | double_exponential(0, 0.5) |
Lasso-type prior to enable variable selection |
The double-exponential (or Laplace) prior places more mass near 0 than the normal, encouraging sparsity in curvature terms (i.e., it can shrink them toward zero). This is helpful if you’re unsure whether curvature is really needed.
Empirical Analysis
Data and Setup
Data Preparation
# Robust data loading function
<- function() {
load_and_prepare_data <- "/home/rober/Descargas/all_events_final_enriched.RData"
data_path
if (file.exists(data_path)) {
load(data_path)
if (exists("all_events_final_enriched")) {
<- all_events_final_enriched
data_for_models message("Successfully loaded 'all_events_final_enriched'.")
else {
} message("Warning: 'all_events_final_enriched' not found. Searching for another suitable data object.")
<- ls()
all_objs <- all_objs[sapply(all_objs, function(x) {
suitable_objects <- get(x)
obj is.data.frame(obj) || is.data.table(obj)) && nrow(obj) > 1000
(
})]if (length(suitable_objects) > 0) {
<- get(suitable_objects[1])
data_for_models message(paste("Using fallback data object:", suitable_objects[1]))
else {
} stop("No suitable data object found.")
}
}else {
} stop(paste("Data file not found at path:", data_path))
}
return(data_for_models)
}
# Load data
<- load_and_prepare_data() data_for_models
Variable Construction
# Robust variable mapping
<- names(data_for_models)
available_cols
<- list(
required_mapping diffusion = c("diffusion", "adopt", "adoption", "outcome"),
structural_distance = c("structural_distance", "dist", "distance", "d_ij"),
status_diff = c("status_gap_signed", "education_diff", "status_diff", "education_diff_abs", "edu_diff", "status_gap"),
wage_diff = c("wage_gap", "wage_diff", "wage_distance"),
skill_group = c("skill_group_for_model", "skill_group", "cluster", "group", "LeidenCluster_2015", "skill_cluster_type"),
source_education = c("source_education", "edu_i", "source_edu", "soc_source", "education_source"),
target_education = c("target_education", "edu_j", "target_edu", "soc_target", "education_target"),
source_wage = c("source_wage", "wage_i", "wage_source"),
target_wage = c("target_wage", "wage_j", "wage_target")
)
<- sapply(required_mapping, function(p_names) {
column_matches <- intersect(p_names, available_cols)
found if (length(found) > 0) found[1] else NA_character_
})
# Check for essential columns
<- c("diffusion", "structural_distance", "status_diff", "skill_group")
essential_cols if (any(is.na(column_matches[essential_cols]))) {
<- names(column_matches[essential_cols][is.na(column_matches[essential_cols])])
missing_essentials stop(paste("Missing essential columns for model:", paste(missing_essentials, collapse=", ")))
}
if (!is.data.table(data_for_models)) {
<- as.data.table(data_for_models)
data_for_models
}
<- copy(data_for_models)
generalized_data
# Create variables for the model
`:=`(
generalized_data[, diffusion = as.numeric(get(column_matches[["diffusion"]])),
d_ij = as.numeric(get(column_matches[["structural_distance"]])),
status_gap_signed = as.numeric(get(column_matches[["status_diff"]])),
skill_group_for_model = as.character(get(column_matches[["skill_group"]]))
)]
# Create piecewise variables
`:=`(
generalized_data[, delta_up = pmax(0, status_gap_signed), # Δ⁺: upward movements
delta_down = pmax(0, -status_gap_signed), # Δ⁻: downward movements
status_gap_magnitude = abs(status_gap_signed) # Keep for compatibility
)]
# Handle wage_gap column
if ("wage_diff_abs" %in% names(generalized_data)) {
:= as.numeric(wage_diff_abs)]
generalized_data[, wage_gap message("Using 'wage_diff_abs' for wage_gap.")
else if (!is.na(column_matches[["wage_diff"]])) {
} := as.numeric(get(column_matches[["wage_diff"]]))]
generalized_data[, wage_gap else {
} message("Warning: 'wage_diff' column not found. Creating a dummy 'wage_gap' column.")
:= 0]
generalized_data[, wage_gap
}
# Add source/target variables
if (all(c("source_Edu_Score_Weighted", "target_Edu_Score_Weighted") %in% names(generalized_data))) {
:= as.numeric(source_Edu_Score_Weighted)]
generalized_data[, source_edu := as.numeric(target_Edu_Score_Weighted)]
generalized_data[, target_edu message("Using 'source_Edu_Score_Weighted' and 'target_Edu_Score_Weighted' for education variables.")
}
if (all(c("source_Median_Wage_2015", "target_Median_Wage_2023") %in% names(generalized_data))) {
:= as.numeric(source_Median_Wage_2015)]
generalized_data[, source_wage := as.numeric(target_Median_Wage_2023)]
generalized_data[, target_wage message("Using 'source_Median_Wage_2015' and 'target_Median_Wage_2023' for wage variables.")
}
# Assign content type based on skill group
if ("LeidenCluster_2015" %in% names(generalized_data)) {
message("Applying specific corrections to 'LeidenCluster_2015' as per original script.")
:= as.character(LeidenCluster_2015)]
generalized_data[, LeidenCluster_2015 == "NoCluster_2015", LeidenCluster_2015 := "LeidenC_1"]
generalized_data[LeidenCluster_2015 <- generalized_data[LeidenCluster_2015 %in% c("LeidenC_1", "LeidenC_2")]
generalized_data := fifelse(LeidenCluster_2015 == "LeidenC_1", "Socio-cognitive", "Socio-technical")]
generalized_data[, content_type else {
} message("Warning: 'LeidenCluster_2015' not found. Using generic classification.")
<- unique(generalized_data$skill_group_for_model)
unique_skill_groups set.seed(42)
<- sort(unique_skill_groups)
groups_sorted <- head(groups_sorted, length(groups_sorted) / 2)
enhancing_groups := fifelse(skill_group_for_model %in% enhancing_groups,
generalized_data[, content_type "Socio-cognitive", "Socio-technical")]
}
# Final data cleaning
<- generalized_data[!is.na(content_type)]
generalized_data <- na.omit(generalized_data,
generalized_data cols = c("diffusion", "d_ij", "delta_up", "delta_down", "wage_gap"))
# Verify piecewise variables
message(sprintf("Created piecewise variables: delta_up (mean=%.3f), delta_down (mean=%.3f)",
mean(generalized_data$delta_up), mean(generalized_data$delta_down)))
# Fragment data
<- generalized_data[content_type == "Socio-cognitive"]
cognitive_data <- generalized_data[content_type == "Socio-technical"] physical_data
Descriptive Analysis
<- generalized_data[, .(
desc_stats N = comma(.N),
Diffusion_Rate = sprintf("%.1f%%", mean(diffusion, na.rm = TRUE) * 100),
Mean_Distance = sprintf("%.3f", mean(d_ij, na.rm = TRUE)),
Mean_Upward_Shift = sprintf("%.3f", mean(delta_up, na.rm = TRUE)),
Mean_Downward_Shift = sprintf("%.3f", mean(delta_down, na.rm = TRUE)),
Upward_Diffusion_Rate = sprintf("%.1f%%", mean(diffusion[delta_up > 0], na.rm = TRUE) * 100),
Downward_Diffusion_Rate = sprintf("%.1f%%", mean(diffusion[delta_down > 0], na.rm = TRUE) * 100)
= content_type]
), by
kable(desc_stats,
caption = "**Table 1: Descriptive Statistics by Skill Content Type**",
col.names = c("Content Type", "N", "Diffusion Rate", "Mean Distance",
"Mean Δ⁺", "Mean Δ⁻", "Upward Diffusion", "Downward Diffusion")) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Content Type | N | Diffusion Rate | Mean Distance | Mean Δ⁺ | Mean Δ⁻ | Upward Diffusion | Downward Diffusion |
---|---|---|---|---|---|---|---|
Socio-cognitive | 353,988 | 26.8% | 0.679 | 1.161 | 0.773 | 37.5% | 13.6% |
Socio-technical | 318,951 | 27.0% | 0.675 | 1.512 | 0.294 | 27.0% | 27.1% |
Table 1 Interpretation: This table summarizes the key variables for the two skill types. We observe that socio-cognitive skills have a slightly higher overall diffusion rate. Notably, for both types, the diffusion rate is higher for upward status shifts (Δ⁺ > 0) than for downward shifts (Δ⁻ > 0), providing initial evidence of an aspirational bias in skill adoption.
Descriptive Visualizations
<- function(data) {
create_descriptive_coherent_style
# Trabajar con data.table
if (!is.data.table(data)) {
<- as.data.table(data)
dt else {
} <- copy(data)
dt
}
# Verificar columnas
<- c("status_gap_signed", "content_type", "diffusion")
required_cols <- required_cols[!required_cols %in% names(dt)]
missing_cols
if (length(missing_cols) > 0) {
stop("Missing required columns: ", paste(missing_cols, collapse = ", "))
}
# Crear movement_type
:= fcase(
dt[, movement_type < -0.5, "Downward",
status_gap_signed abs(status_gap_signed) <= 0.5, "Lateral",
> 0.5, "Upward",
status_gap_signed default = NA_character_
)]
# Filtrar datos válidos
<- dt[!is.na(movement_type) & !is.na(content_type)]
dt_clean
# Calcular estadísticas
<- dt_clean[, .(
summary_stats diffusion_rate = mean(diffusion, na.rm = TRUE),
n = .N
= .(movement_type, content_type)]
), by
# Calcular intervalos de confianza
`:=`(
summary_stats[, se = sqrt(diffusion_rate * (1 - diffusion_rate) / n),
ci_lower = pmax(0, diffusion_rate - 1.96 * sqrt(diffusion_rate * (1 - diffusion_rate) / n)),
ci_upper = pmin(1, diffusion_rate + 1.96 * sqrt(diffusion_rate * (1 - diffusion_rate) / n))
)]
# Convertir a data.frame y ordenar
<- as.data.frame(summary_stats)
summary_df $movement_type <- factor(summary_df$movement_type,
summary_dflevels = c("Downward", "Lateral", "Upward"))
# Mismos colores que el plot de líneas
<- c(
coherent_colors "Socio-cognitive" = "#1f77b4", # Azul coherente
"Socio-technical" = "#ff7f0e" # Naranja coherente
)
# Crear el plot con el mismo estilo
<- ggplot(summary_df, aes(x = movement_type, y = diffusion_rate, fill = content_type)) +
p
# Área sombreada sutil para destacar el escalator effect
annotate("rect", xmin = 2.5, xmax = 3.5, ymin = -Inf, ymax = Inf,
fill = "#E8F4FD", alpha = 0.3) + # Azul muy claro
# Barras principales
geom_col(position = position_dodge(0.8),
width = 0.7,
color = "white",
linewidth = 0.8,
alpha = 0.85) +
# Barras de error elegantes
geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper),
position = position_dodge(0.8),
width = 0.25,
linewidth = 1,
color = "grey40") +
# Etiquetas de porcentaje
geom_text(aes(label = paste0(round(diffusion_rate * 100, 1), "%")),
position = position_dodge(0.8),
vjust = -0.6,
size = 5,
fontface = "bold",
color = "grey20") +
# Flecha sutil llamando atención al escalator effect
annotate("segment",
x = 2.7, xend = 3.3,
y = 0.42, yend = 0.42,
arrow = arrow(length = unit(0.3, "cm"), type = "closed"),
color = "#1f77b4",
linewidth = 2,
alpha = 0.8) +
# Texto explicativo sutil
annotate("text",
x = 3, y = 0.45,
label = "ESCALATOR\nEFFECT",
color = "#1f77b4",
fontface = "bold",
size = 4,
hjust = 0.5,
lineheight = 0.9) +
# Escalas coherentes con el estilo
scale_fill_manual(values = coherent_colors, name = "Skill Type:") +
scale_y_continuous(
limits = c(0, 0.5),
breaks = seq(0, 0.5, 0.1),
expand = expansion(mult = c(0, 0.05)),
labels = function(x) paste0(x*100, "%")
+
) scale_x_discrete(
labels = c("Downward\n(High → Low)", "Lateral\n(Similar Status)", "Upward\n(Low → High)")
+
)
# Etiquetas coherentes
labs(
title = "Raw Data Reveals Asymmetric Trajectory Channeling",
subtitle = "Descriptive evidence for differential skill mobility patterns by content type",
x = "Status Movement Direction",
y = "Observed Skill Adoption Rate"
+
)
# MISMO TEMA que el plot de líneas
theme_minimal(base_size = 15) +
theme(
# Mismo fondo gris
plot.background = element_rect(fill = "white", color = NA),
panel.background = element_rect(fill = "white", color = NA),
# Grid sutil igual al original
panel.grid.major = element_line(color = "white", linewidth = 0.6),
panel.grid.minor = element_line(color = "white", linewidth = 0.3),
# Sin bordes para mantener coherencia
panel.border = element_blank(),
axis.line = element_line(color = "grey40", linewidth = 0.5),
# Títulos coherentes
plot.title = element_text(
size = 18,
face = "bold",
hjust = 0.5,
color = "grey20",
margin = margin(b = 8)
),plot.subtitle = element_text(
size = 14,
hjust = 0.5,
color = "grey40",
margin = margin(b = 20)
),
# Ejes coherentes
axis.title.x = element_text(
size = 15,
color = "grey30",
margin = margin(t = 10)
),axis.title.y = element_text(
size = 15,
color = "grey30",
margin = margin(r = 10)
),axis.text = element_text(
size = 15,
color = "grey40"
),axis.ticks = element_line(color = "grey50", linewidth = 0.4),
# Leyenda coherente
legend.position = "bottom",
legend.background = element_rect(fill = "white", color = NA),
legend.text = element_text(size = 14, color = "black"),
legend.title = element_text(size = 12, face = "bold", color = "black"),
legend.key.size = unit(1, "cm"),
legend.margin = margin(t = 15),
# Márgenes coherentes
plot.margin = margin(20, 20, 15, 20)
+
)
# Guías de leyenda coherentes
guides(
fill = guide_legend(
title.position = "left",
nrow = 1,
keywidth = unit(1.2, "cm"),
keyheight = unit(0.8, "cm")
)
)
return(p)
}
# Crear el gráfico con estilo coherente
<- create_descriptive_coherent_style(generalized_data)
fig_coherent_bars print(fig_coherent_bars)
Model Specification and Estimation
Piecewise Dual Process Models
# Formula for the standard linear piecewise model
<- diffusion ~ d_ij + delta_up + delta_down + wage_gap formula_piecewise
Separate Model Estimation
# Estimate separate models for each skill type
<- glm(formula_piecewise, data = cognitive_data, family = binomial(link = "logit"))
model_cognitive_piecewise <- glm(formula_piecewise, data = physical_data, family = binomial(link = "logit")) model_physical_piecewise
Parameter Extraction and Comparison
# Function to extract parameters from GLM models
<- function(model, skill_type, data_subset) {
extract_piecewise_parameters <- coef(model)
coefs
<- function(name, default_val = 0) if (name %in% names(coefs)) coefs[name] else default_val
get_param
return(list(
skill_type = skill_type,
n_obs = nrow(data_subset),
lambda_hat = get_param("d_ij"),
beta_up_hat = get_param("delta_up"),
beta_down_hat = get_param("delta_down"),
wage_hat = get_param("wage_gap"),
asymmetry = get_param("delta_up") - get_param("delta_down"),
aic = AIC(model),
mcfadden_r2 = 1 - (model$deviance / model$null.deviance)
))
}
<- extract_piecewise_parameters(model_cognitive_piecewise, "Socio-cognitive", cognitive_data)
params_cognitive_pw <- extract_piecewise_parameters(model_physical_piecewise, "Socio-technical", physical_data) params_physical_pw
Statistical Tests and Model Comparison
# Create a comparison table for the piecewise models
<- data.frame(
piecewise_comparison `Skill Type` = c("Socio-cognitive", "Socio-technical"),
N = c(comma(params_cognitive_pw$n_obs), comma(params_physical_pw$n_obs)),
`λ (Distance)` = c(params_cognitive_pw$lambda_hat, params_physical_pw$lambda_hat),
`β⁺ (Upward)` = c(params_cognitive_pw$beta_up_hat, params_physical_pw$beta_up_hat),
`β⁻ (Downward)` = c(params_cognitive_pw$beta_down_hat, params_physical_pw$beta_down_hat),
`Asymmetry (β⁺ - β⁻)` = c(params_cognitive_pw$asymmetry, params_physical_pw$asymmetry),
`McFadden's R²` = c(params_cognitive_pw$mcfadden_r2, params_physical_pw$mcfadden_r2),
AIC = c(params_cognitive_pw$aic, params_physical_pw$aic),
check.names = FALSE
)
kable(piecewise_comparison,
caption = "**Table 2: Frequentist Piecewise Model Results**",
digits = 4) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) %>%
column_spec(6, bold = TRUE, background = "#F0F8FF")
Skill Type | N | λ (Distance) | β⁺ (Upward) | β⁻ (Downward) | Asymmetry (β⁺ - β⁻) | McFadden's R² | AIC |
---|---|---|---|---|---|---|---|
Socio-cognitive | 353,988 | 1.1178 | 0.4440 | -0.2582 | 0.7022 | 0.1206 | 362128.6 |
Socio-technical | 318,951 | 2.5693 | -0.1216 | -0.2251 | 0.1035 | 0.0129 | 367436.0 |
Table 2 Interpretation: The frequentist models confirm the descriptive findings. Both skill types are penalized by structural distance (negative λ). However, the key result is in the Asymmetry column. Socio-cognitive skills have a large, positive asymmetry coefficient (0.7022), indicating that upward status moves are strongly favored over downward moves. In contrast, socio-technical skills have a much smaller asymmetry value (0.1035), suggesting a more neutral response to the direction of status change.
Model Predictions & Visualization
<- function(model_cog, model_phys, data_cog, data_phys, n_points = 150) {
generate_piecewise_predictions
<- median(data_cog$d_ij, na.rm = TRUE)
baseline_distance_cog <- median(data_cog$wage_gap, na.rm = TRUE)
baseline_wage_cog
<- median(data_phys$d_ij, na.rm = TRUE)
baseline_distance_phys <- median(data_phys$wage_gap, na.rm = TRUE)
baseline_wage_phys
# Create status range for predictions
<- seq(-4, 4, length.out = n_points)
status_range
# Convert to piecewise variables
<- pmax(0, status_range)
delta_up_range <- pmax(0, -status_range)
delta_down_range
<- data.frame(
pred_data_cognitive delta_up = delta_up_range,
delta_down = delta_down_range,
d_ij = baseline_distance_cog,
wage_gap = baseline_wage_cog
)<- predict(model_cog, newdata = pred_data_cognitive, type = "response")
pred_cognitive
<- data.frame(
pred_data_physical delta_up = delta_up_range,
delta_down = delta_down_range,
d_ij = baseline_distance_phys,
wage_gap = baseline_wage_phys
)<- predict(model_phys, newdata = pred_data_physical, type = "response")
pred_physical
<- rbind(
predictions data.frame(status_gap = status_range, probability = pred_cognitive, content_type = "Socio-cognitive"),
data.frame(status_gap = status_range, probability = pred_physical, content_type = "Socio-technical")
)
return(predictions)
}
# Generate predictions from the frequentist models
<- generate_piecewise_predictions(model_cognitive_piecewise, model_physical_piecewise, cognitive_data, physical_data)
piecewise_predictions
# Define the equation string for the plot subtitle
<- "logit(P) = θ<sub>0</sub> - λd - β<sup>+</sup>Δ<sup>+</sup> - β<sup>-</sup>Δ<sup>-</sup>"
equation_str
<- ggplot(piecewise_predictions, aes(x = status_gap, y = probability, color = content_type)) +
p_main_piecewise geom_line(linewidth = 1.8, alpha = 0.9) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.7, color = "gray50") +
scale_color_manual(values = color_palette, name = "Skill Type:") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
annotate("text", x = 2.5, y = max(piecewise_predictions$probability) * 0.85,
label = "ASPIRATIONAL PULL\n(Low-to-High Status)",
hjust = 0.5, size = 4.5, color = "gray30", fontface = "bold") +
annotate("text", x = -2.5, y = max(piecewise_predictions$probability) * 0.85,
label = "CONTAINMENT\n(High-to-Low Status)",
hjust = 0.5, size = 4.5, color = "gray30", fontface = "bold") +
labs(
title = "Asymmetric Trajectory Channeling: Evidence from Piecewise Models",
subtitle = paste("Model:", equation_str),
x = "Educational Status Gap (Destination - Source)",
y = "Predicted Skill Diffusion Probability"
+
) theme_paper() +
theme(
legend.position = "bottom",
plot.subtitle = element_markdown(size = rel(1.2))
)
print(p_main_piecewise)
Figure 3 Interpretation: This plot visualizes the core finding from the frequentist models. The steep upward slope for socio-cognitive skills (blue) for positive status gaps (x-axis > 0) illustrates a strong “aspirational pull.” Conversely, the flatter curve for socio-technical skills (red) demonstrates their relative indifference to status-gaining imitation, highlighting the “Asymmetric Trajectory Channeling” effect.
Advanced Bayesian Analysis (with Curvature Selection)
cat("🚀 ADVANCED BAYESIAN SETUP - Non-Linear Model with Shrinkage Priors\n")
🚀 ADVANCED BAYESIAN SETUP - Non-Linear Model with Shrinkage Priors
cat("====================================================================\n")
====================================================================
cat("OBJECTIVE: Estimate curvature effects using Laplace (Lasso) priors.\n")
OBJECTIVE: Estimate curvature effects using Laplace (Lasso) priors.
cat("METHOD: High-quality Hamiltonian Monte Carlo (HMC) on a large sample.\n\n")
METHOD: High-quality Hamiltonian Monte Carlo (HMC) on a large sample.
# 1. Subsample data for a balance of speed and robustness
set.seed(123)
<- 50000
sample_size
<- if (nrow(cognitive_data) > sample_size) {
cognitive_data_sample sample(.N, sample_size)]
cognitive_data[else {
}
cognitive_data
}
<- if (nrow(physical_data) > sample_size) {
physical_data_sample sample(.N, sample_size)]
physical_data[else {
}
physical_data
}
message(sprintf("Subsampling for HMC. Using %s cognitive and %s physical observations.",
comma(nrow(cognitive_data_sample)),
comma(nrow(physical_data_sample))))
# 2. Prepare polynomial terms
<- cognitive_data_sample %>%
cognitive_data_sample mutate(
delta_up2 = delta_up^2,
delta_down2 = delta_down^2
)
<- physical_data_sample %>%
physical_data_sample mutate(
delta_up2 = delta_up^2,
delta_down2 = delta_down^2
)
# 3. Non-linear model formula (no underscores in parameter names)
<- bf(
formula_nonlinear ~ theta0 - lambda * d_ij - beta1up * delta_up - beta2up * delta_up2 - beta1down * delta_down - beta2down * delta_down2,
diffusion + lambda + beta1up + beta2up + beta1down + beta2down ~ 1,
theta0 nl = TRUE
)
# 4. Prior specification (using double_exponential)
<- c(
priors_lasso prior(normal(0, 5), nlpar = "theta0"),
prior(normal(0, 1), nlpar = "lambda", lb = 0),
prior(normal(0, 1), nlpar = "beta1up"),
prior(normal(0, 1), nlpar = "beta1down"),
prior(double_exponential(0, 0.5), nlpar = "beta2up"),
prior(double_exponential(0, 0.5), nlpar = "beta2down")
)
<- Sys.time() start_hmc_time
# =============================================================================
# HIGH-QUALITY FITTING FUNCTION FOR ADVANCED MODEL USING HMC
# =============================================================================
<- function(formula, data, model_name, priors) {
fit_advanced_hmc_model
cat(sprintf("\n🚀 ADVANCED FITTING (HMC): %s\n", toupper(model_name)))
cat("=" %>% rep(50) %>% paste(collapse = ""), "\n")
cat(sprintf("📊 Using %s observations from sample.\n", comma(nrow(data))))
cat("⚙️ Algorithm: Hamiltonian Monte Carlo (NUTS Sampler).\n")
<- Sys.time()
start_time <- NULL
model
tryCatch({
<- brm(
model formula = formula,
data = data,
prior = priors,
family = bernoulli(),
chains = 2,
cores = 4,
iter = 1000,
warmup = 500,
seed = 42,
silent = 0,
control = list(adapt_delta = 0.9)
)cat("✅ Advanced HMC model fitted successfully.\n")
error = function(e) {
}, cat("❌ ERROR during advanced model fitting:\n")
cat(e$message, "\n")
})
<- Sys.time()
end_time <- difftime(end_time, start_time, units = "secs")
time_elapsed cat(sprintf("⏱️ Fitting time: %.1f seconds.\n", time_elapsed))
return(list(model = model, time = time_elapsed))
}
# =============================================================================
# FIT MODELS WITH ADVANCED CONFIGURATION
# =============================================================================
# ADVANCED COGNITIVE MODEL
<- fit_advanced_hmc_model(
results_cognitive_advanced formula = formula_nonlinear,
data = cognitive_data_sample,
model_name = "cognitive_advanced_hmc",
priors = priors_lasso
)
🚀 ADVANCED FITTING (HMC): COGNITIVE_ADVANCED_HMC
==================================================
📊 Using 50,000 observations from sample.
⚙️ Algorithm: Hamiltonian Monte Carlo (NUTS Sampler).
✅ Advanced HMC model fitted successfully.
⏱️ Fitting time: 2852.7 seconds.
ation took 0.109476 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1094.76 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 2:
Chain 2: Gradient evaluation took 0.13411 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1341.1 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 2: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 2: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 2: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 2: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 1447.14 seconds (Warm-up)
Chain 1: 1060.98 seconds (Sampling)
Chain 1: 2508.12 seconds (Total)
Chain 1:
Chain 2: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 1877.97 seconds (Warm-up)
Chain 2: 890.753 seconds (Sampling)
Chain 2: 2768.72 seconds (Total)
Chain 2:
# ADVANCED PHYSICAL MODEL
<- fit_advanced_hmc_model(
results_physical_advanced formula = formula_nonlinear,
data = physical_data_sample,
model_name = "physical_advanced_hmc",
priors = priors_lasso
)
%] (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 1359.17 seconds (Warm-up)
Chain 2: 636.56 seconds (Sampling)
Chain 2: 1995.73 seconds (Total)
Chain 2:
<- Sys.time()
end_hmc_time <- difftime(end_hmc_time, start_hmc_time, units = "secs")
total_analysis_time
cat("\n🏁 ADVANCED ANALYSIS COMPLETE\n")
cat("=====================================\n")
cat(sprintf("⏱️ Total analysis time: %.1f seconds\n", total_analysis_time))
<- !is.null(results_cognitive_advanced$model) && !is.null(results_physical_advanced$model)
all_success_advanced
if(all_success_advanced) {
cat("✅ Both advanced models fitted successfully.\n")
else {
} cat("❌ Error in fitting one or more advanced models.\n")
}
# =============================================================================
# PARAMETER EXTRACTION (ADVANCED MODEL)
# =============================================================================
if(all_success_advanced) {
cat("\n🔍 EXTRACTING PARAMETERS FROM ADVANCED MODEL...\n")
cat("=" %>% rep(55) %>% paste(collapse = ""), "\n")
# Extract posterior draws
<- as_draws_df(results_cognitive_advanced$model)
posterior_cog_advanced <- as_draws_df(results_physical_advanced$model)
posterior_phys_advanced
# Correct parameter names based on brms output for non-linear models
<- c("b_beta1up_Intercept", "b_beta1down_Intercept")
required_cols
if (all(required_cols %in% names(posterior_cog_advanced)) && all(required_cols %in% names(posterior_phys_advanced))) {
# Asymmetry parameters (based on linear terms)
<- posterior_cog_advanced$b_beta1up_Intercept - posterior_cog_advanced$b_beta1down_Intercept
asymmetry_cog_advanced <- posterior_phys_advanced$b_beta1up_Intercept - posterior_phys_advanced$b_beta1down_Intercept
asymmetry_phys_advanced <- asymmetry_cog_advanced - asymmetry_phys_advanced
asymmetry_diff_advanced
# Statistics
<- mean(asymmetry_diff_advanced > 0, na.rm = TRUE)
prob_hypothesis_advanced <- median(asymmetry_diff_advanced, na.rm = TRUE)
effect_size_advanced <- quantile(asymmetry_diff_advanced, c(0.025, 0.975), na.rm = TRUE)
ci_95_advanced
cat("🎯 PRELIMINARY RESULTS (ADVANCED MODEL):\n")
cat("=" %>% rep(45) %>% paste(collapse = ""), "\n")
cat(sprintf("• P(asymmetry_cog > asymmetry_phys): %.2f%%\n", prob_hypothesis_advanced * 100))
cat(sprintf("• Median Asymmetry Difference: %.4f\n", effect_size_advanced))
cat(sprintf("• 95%% Credible Interval (approx.): [%.4f, %.4f]\n", ci_95_advanced[1], ci_95_advanced[2]))
else {
} # Set to NA if parameters were not found
<- NA
prob_hypothesis_advanced <- NA
effect_size_advanced <- c(NA, NA)
ci_95_advanced cat("⚠️ WARNING: Expected parameters not found in model output.\n")
}
cat("\n✅ ADVANCED PARAMETER EXTRACTION COMPLETE\n")
else {
} cat("\n❌ CANNOT EXTRACT PARAMETERS DUE TO MODEL FAILURE\n")
# Assign NA so the next chunk doesn't fail
<- NA
prob_hypothesis_advanced }
🔍 EXTRACTING PARAMETERS FROM ADVANCED MODEL...
=======================================================
🎯 PRELIMINARY RESULTS (ADVANCED MODEL):
=============================================
• P(asymmetry_cog > asymmetry_phys): 0.00%
• Median Asymmetry Difference: -0.5865
• 95% Credible Interval (approx.): [-0.6923, -0.4836]
✅ ADVANCED PARAMETER EXTRACTION COMPLETE
# =============================================================================
# BAYESIAN PREDICTIONS (ADVANCED MODEL)
# =============================================================================
if(exists("all_success_advanced") && all_success_advanced) {
cat("\n🔮 GENERATING BAYESIAN PREDICTIONS (ADVANCED MODEL)...\n")
cat("=" %>% rep(60) %>% paste(collapse = ""), "\n")
<- function(model_cog, model_phys, n_points = 100) {
generate_advanced_bayesian_predictions
<- seq(-4, 4, length.out = n_points)
status_range <- pmax(0, status_range)
delta_up_range <- pmax(0, -status_range)
delta_down_range
# Must include quadratic terms in prediction data
<- data.frame(
newdata_cog delta_up = delta_up_range,
delta_down = delta_down_range,
delta_up2 = delta_up_range^2,
delta_down2 = delta_down_range^2,
d_ij = median(cognitive_data_sample$d_ij, na.rm = TRUE)
)
<- data.frame(
newdata_phys delta_up = delta_up_range,
delta_down = delta_down_range,
delta_up2 = delta_up_range^2,
delta_down2 = delta_down_range^2,
d_ij = median(physical_data_sample$d_ij, na.rm = TRUE)
)
<- fitted(model_cog, newdata = newdata_cog, summary = TRUE)
pred_cog <- fitted(model_phys, newdata = newdata_phys, summary = TRUE)
pred_phys
<- rbind(
predictions data.frame(
status_gap = status_range,
probability = pred_cog[, "Estimate"],
lower_95 = pred_cog[, "Q2.5"],
upper_95 = pred_cog[, "Q97.5"],
content_type = "Socio-cognitive"
),data.frame(
status_gap = status_range,
probability = pred_phys[, "Estimate"],
lower_95 = pred_phys[, "Q2.5"],
upper_95 = pred_phys[, "Q97.5"],
content_type = "Socio-technical"
)
)
return(predictions)
}
<- generate_advanced_bayesian_predictions(
bayesian_advanced_predictions $model,
results_cognitive_advanced$model
results_physical_advanced
)
cat("✅ Advanced model predictions generated.\n")
}
🔮 GENERATING BAYESIAN PREDICTIONS (ADVANCED MODEL)...
============================================================
✅ Advanced model predictions generated.
# =============================================================================
# BAYESIAN VISUALIZATION (ADVANCED MODEL)
# =============================================================================
if(exists("all_success_advanced") && all_success_advanced && exists("bayesian_advanced_predictions")) {
# Safely handle NA values so the plot doesn't fail
if (is.na(prob_hypothesis_advanced)) {
<- "INDETERMINATE"
evidence_label_advanced <- "95% Credible Interval (HMC). Model results were indeterminate."
caption_text else {
} <- if(prob_hypothesis_advanced > 0.99) "STRONG"
evidence_label_advanced else if(prob_hypothesis_advanced > 0.95) "MODERATE"
else "WEAK"
<- sprintf("95%% Credible Interval (HMC). Evidence for linear asymmetry: **%s** (P = %.1f%%)",
caption_text * 100)
evidence_label_advanced, prob_hypothesis_advanced
}
<- "logit(P) = θ<sub>0</sub> - λd - β<sub>1</sub><sup>+</sup>Δ<sup>+</sup> - β<sub>2</sub><sup>+</sup>(Δ<sup>+</sup>)<sup>2</sup> - β<sub>1</sub><sup>-</sup>Δ<sup>-</sup> - β<sub>2</sub><sup>-</sup>(Δ<sup>-</sup>)<sup>2</sup>"
equation_str_adv
<- ggplot(bayesian_advanced_predictions,
p_bayesian_advanced aes(x = status_gap, color = content_type, fill = content_type)) +
geom_ribbon(aes(ymin = lower_95, ymax = upper_95), alpha = 0.2, color = NA) +
geom_line(aes(y = probability), linewidth = 2, alpha = 0.9) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.8, color = "gray40", linewidth = 1) +
scale_color_manual(values = color_palette, name = "Skill Type:") +
scale_fill_manual(values = color_palette, name = "Skill Type:") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_x_continuous(breaks = seq(-4, 4, 2)) +
labs(
title = "Advanced Bayesian Analysis: Curvature Effects",
subtitle = paste("Model:", equation_str_adv),
x = "Educational Status Gap (Destination - Source)",
y = "Predicted Skill Diffusion Probability",
caption = caption_text
+
)
theme_paper(base_size = 16) +
theme(
plot.title = element_text(size = rel(1.4)),
plot.subtitle = element_markdown(size = rel(1.2)),
plot.caption = element_markdown(size = rel(0.9)),
legend.position = "bottom"
)
print(p_bayesian_advanced)
ggsave(file.path(output_dir, "bayesian_advanced_hmc_channeling.png"),
width = 12, height = 8, dpi = 300, bg = "white")
p_bayesian_advanced,
cat("\n✅ ADVANCED VISUALIZATION COMPLETE\n")
else {
} cat("\n❌ COULD NOT GENERATE ADVANCED VISUALIZATION\n")
cat("Reason: Advanced Bayesian analysis did not complete successfully.\n")
}
✅ ADVANCED VISUALIZATION COMPLETE
Discussion: The Stratification Engine
Our empirical results provide robust evidence that the process of Asymmetric Trajectory Channeling emerges from fundamentally different piecewise diffusion processes operating for cognitive and physical skills. The significant difference in the directional asymmetry parameters (β⁺ - β⁻) demonstrates that these skill types follow qualitatively distinct mobility regimes, with our piecewise specification eliminating previous concerns about multicollinearity.
Escalator Dynamics for Cognitive Skills: The strong aspirational pattern (β⁺ = +0.4440, β⁻ = -0.2582), shows that cognitive skills are actively facilitated in upward movements while being penalized in downward ones. This confirms our hypothesis that cognitive skills are readily adopted by organizations seeking to establish higher-status occupations. The piecewise specification allows us to precisely quantify this: upward moves (Δ⁺) receive a substantial facilitation of +0.4440 while downward moves (Δ⁻) face a penalty of -0.2582.
Neutral Dynamics for Physical Skills: The contrasting pattern for physical skills shows minimal directional bias (β⁺ = -0.1216, β⁻ = -0.2251), with both movements facing slight penalties but downward moves being more penalized. This indicates they diffuse based primarily on functional considerations rather than strong status signaling.
Methodological Advancement: Our piecewise approach resolves the multicollinearity problem inherent in previous formulations that used both (s_j - s_i) and |s_j - s_i|. By defining Δ⁺ = max(0, s_j - s_i) and Δ⁻ = max(0, s_i - s_j), we create orthogonal variables (Δ⁺ · Δ⁻ = 0 always) that allow for unambiguous interpretation of directional effects. The use of non-linear models with shrinkage priors (Laplace) further allows for flexible detection of curvature while preventing overfitting.
Robust Statistical Evidence: Both our frequentist piecewise models and Bayesian analyses provide consistent evidence for these patterns, with the asymmetry difference confirming the strong aspirational channeling effect for cognitive skills. This confirms stratification as an active, demand-side process driven by organizational imitation patterns where cognitive skills benefit from systematic upward facilitation while physical skills face more neutral or constrained mobility patterns.
Conclusions and Future Directions
This study makes several key contributions. Theoretically, we specify a piecewise dual process model of inter-organizational imitation that eliminates multicollinearity concerns while demonstrating that cognitive and physical skills follow fundamentally distinct diffusion logics. Our Piecewise Dual Process Diffusion Model, especially its non-linear extension with shrinkage priors, provides direct evidence for content-specific directional filtering mechanisms.
Empirically, our analysis provides robust evidence for distinct directional asymmetries and the resulting macro-level process of Asymmetric Trajectory Channeling. The piecewise specification allows for precise quantification of upward versus downward mobility penalties, offering clear policy targets. Methodologically, we demonstrate how piecewise regression approaches can resolve common econometric problems in diffusion studies while maintaining theoretical interpretability. This approach could be valuable for other scholars studying directional processes in organizational and economic contexts.
Future research should examine the temporal dynamics of these piecewise diffusion patterns, test our framework in different institutional contexts, and run the full MCMC version of the advanced model presented here to obtain more precise posterior distributions for the final publication.
Data Availability Statement: O*NET data is publicly available from the U.S. Department of Labor.