# Xu Ben Project: Methodology Summary v0.2

Updated: 2026-06-17

This is a concise summary of where the project stands after the pilot phase. It is not the full literature review (see `xuben-methodology-literature-review-v0.1.md`) but a 1-2 page overview for orientation.

## Original Hypothesis

Chinese-language public intellectuals in the 2020s appear to shift from specific institutional, historical, and social analysis toward diagnostic discourse centred on "subjectivity," "the complete human," "civilisational crisis," and "meaning." Xu Ben is the entry point.

The initial expectation was that aggregate keyword frequency would show a measurable decline in institutional vocabulary and a rise in diagnostic vocabulary across Xu Ben's Aisixiang archive over time.

## How Data Changed The Hypothesis

Five corrections emerged from the pilot:

1. **Genre mix effect.** The aggregate diagnostic-share trend (25% → 31% → 35%) is partly a composition effect. Within 论文 alone, the trend flattens at 29% in 2021-2026. High-diagnostic 读书 (book review) articles pull up the aggregate.

2. **Platform selection effect.** The institutional vocabulary decline seen on Aisixiang (99 → 52/10k) does not appear on Caixin blog (71 → 69/10k, flat) for the same period. Paired same-article comparison (14 pairs) shows nearly identical density across platforms, confirming the divergence is content selection, not platform editing.

3. **Publication-pipeline effect.** Essay collections show 46-89% chapter-article title overlap. Caixin blog functioned as a draft workspace for 《颓废与沉默》(2015); Aisixiang functioned as an archive/repost layer for 《人以什么理由来记忆》(2008). Different platforms serve different books at different lifecycle stages.

4. **The same word does different work.** Multi-round AI-assisted close reading of 6 articles exposed and corrected several incorrect interpretations and revealed that 制度 performs at least four distinct text functions. The institutional/diagnostic binary cannot capture this.

5. **Batch-import dates distort chronology.** Aisixiang's "更新时间" includes batch-import events (e.g., 2008-07-16: 12 articles in one hour). These are archive-loading dates, not publication dates.

## Current Model

The project now uses a **text-function + lifecycle-position** framework instead of the original aggregate keyword trend.

### Text-function types

| Type | 制度 does what | Example |
|---|---|---|
| institution_building | Constructive framework: civic education, democratic governance | 好的公共生活 (2012) |
| institution_failure_analysis | Domination mechanism: propaganda, totalitarian education | 解剖宣传 (2010) |
| public_intellectual_failure_diagnosis | Background: focus on intellectual silence, cynicism, failure | 犬儒主义是弱者的抵抗 (2016) |
| humanistic_meaning_diagnosis | Exits frame: AI, human subjectivity, meaning, civilisation | AI与人类自毁 (2025) |

**Caveat**: this is a working model, not a final taxonomy. A single text can mix types (e.g., Slot 2 combines institution_failure_analysis with public_intellectual_failure_diagnosis). The model should remain open to further differentiation or recombination as more texts are read.

### Lifecycle-position model

Each platform record should be located in the article-book lifecycle:

- **Caixin blog**: draft/workspace for some essay collections (especially 2012-2015 犬儒 line)
- **Aisixiang**: archive/repost for academic material and book chapters; also hosts newer 论文/时评
- **Books**: consolidate, reorganize, and recontextualize platform articles
- **Media interviews**: compress theoretical concepts into public diagnosis (genre effect, not intellectual shift)

### Article-book relation types

| Type | Count | Meaning |
|---|---|---|
| article_before_book_recent | 39 | Same-project simultaneous output |
| shared_prior_source | 9 | Both platform and book derive from earlier academic writing |
| article_before_book | 3 | Platform article collected into later book |
| book_excerpt_or_adaptation | 3 | Article appeared after book |
| uncertain | 3 | Cannot determine |

## What We Can Say

> Xu Ben's public writing is a circulation system. The same author produces different text types across platforms, genres, and publication cycles. The visible vocabulary shift is real in the current sampled materials but reflects text-function composition and pipeline position, not only intellectual trajectory. The strongest finding is methodological: aggregate keyword trends mislead if they do not control for text function and lifecycle position.

## What We Cannot Say Yet

- "Xu Ben abandoned institutional analysis." (2016 samples prove otherwise)
- "Aisixiang proves a general author-level transformation." (Caixin does not show the same trend)
- "Censorship caused the diagnostic turn." (No evidence of same-article political editing)
- "Chinese public intellectuals as a group shifted in the same way." (Only one author studied)
- "Media agenda explains the shift." (Media-agenda layer has no empirical data yet)

## Current Data

| Layer | Records | Index | Status |
|---|---|---:|---|
| Aisixiang | 89 body-level | 457 | First round complete |
| Caixin blog | 42 body-level | 289 | First round complete |
| Book-article overlap | 57 matches / 125 chapters | 4 books | Partially verified |
| Publication map | 32 books | — | Partially verified |
| Close reading | 6 articles / 4 reviewers | — | Complete |
| CDT | — | ~520 estimated | Not started |
| Media agenda | — | — | Design scaffold only |

## Next Phase

Stabilize the current model before expanding data:

1. Add a `text_function_manual` field to article-level data, determined by close reading, not automated classification. Analysis scripts should aggregate this field, not assign it.
2. Write up the article-book lifecycle model as a standalone methodological contribution.
3. Only then consider CDT, additional book TOCs, or media-agenda sampling.
