# Xu Ben Project: Current Findings (Public Version)

> This is a public summary of working findings. For the full internal research log, see the project's local workspace.

Updated: 2026-06-17

Status: working findings. These are not publication-ready claims. They summarize what the current pilot evidence supports, what it weakens, and what must be checked next.

## Current Evidence Base

| Layer | Current State | Use |
|---|---|---|
| Aisixiang | 457-item index snapshot; 89 body-level records across 论文, 时评, 读书, 访谈, 译作 | Long-run mainland-circulable archive |
| Caixin blog | 289-item index, 2011-2020; 42 body-level records after paired-overlap expansion | Mainland blog/archive comparison layer |
| Publication map | Xu Ben book/publication clusters mapped and partly verified | Publication-cycle context |
| Book-article overlap | 4 books, 125 chapters checked, 57 title matches; readable table generated | Test whether platform articles become book chapters |
| Media agenda map | Design scaffold only | Future control/background layer |

Full article bodies are not stored by default. The project stores metadata and derived keyword metrics.

## Finding 1: Aisixiang Shows A Diagnostic-Share Rise, But Genre Mix Matters

In the Aisixiang first-round sample, aggregate diagnostic share rises across periods. However, the project's genre-stratified analysis shows that this is partly a genre-mix effect:

- corpus: 89 Aisixiang records;
- genre distribution: 论文 40, 时评 30, 读书 9, 访谈 9, 译作 1;
- aggregate diagnostic share trend: 25.0% -> 31.0% -> 34.9%;
- within 论文 alone, the 2021-2026 trend flattens around 29.2%.

Interpretation:

- The diagnostic-turn hypothesis survives, but it cannot be stated as a simple author-wide vocabulary shift.
- The genre of public writing matters: book reviews and interviews may be structurally more diagnostic than academic essays.
- Future reporting must stay genre-stratified.

## Finding 2: The Simple "Institutional Words Decline" Claim Is Too Crude

The v0.2 keyword groups split institutional vocabulary into:

- `institutional_design`;
- `civic_public`;
- `authoritarianism_control`.

This matters because 2021-2026 texts contain many authoritarianism/control terms. That lifts the aggregate institutional total without meaning a return to reform-oriented institutional analysis.

Current interpretation:

- the likely shift is not from "institutional" to "non-institutional";
- it is from institution-building / civic-public vocabulary toward institution-failure diagnosis, humanistic preservation, and civilisation-diagnostic vocabulary.

## Finding 3: Publication Cycle Is A Real Structuring Variable

The article-publication map suggests that Xu Ben's articles cluster around book projects:

- public-life/civic-political essays around public-life books;
- cynicism essays around cynicism/culture-critique books;
- humanities/enlightenment essays around classics and public-humanities books;
- tyranny/totalitarianism essays around the Hong Kong/Taiwan `暴政` / `极权` publication line;
- AI-humanism essays around 2025-2026 AI-era humanities books.

Interpretation:

- short essays should not be treated as free-floating opinion pieces;
- the publication cycle may be the correct unit of analysis for some periods;
- vocabulary shifts can reflect active book projects as much as underlying intellectual transformation.

## Finding 4: Caixin Weakens An Aisixiang-Only Author-Level Claim

The first Caixin comparison shows that the institutional-vocabulary decline found in Aisixiang does not appear in the same way on Caixin blog:

| Platform | Period 1 | Period 2 | Pattern |
|---|---:|---:|---|
| Aisixiang institutional density | 98.99/10k (2011-2014) | 51.82/10k (2015-2020) | sharp decline |
| Caixin institutional density | 70.69/10k (2011-2014) | 69.40/10k (2015-2020) | essentially flat |

Interpretation:

- the Aisixiang signal cannot be treated as direct proof of author-level change;
- platform selection, genre, and sampling must be treated as active variables;
- this validates the project's "platform first" rule.

## Finding 5: Overlap Analysis Points To Content Selection, Not Platform Editing

the project's paired comparison found 83 title overlaps between the Caixin and Aisixiang full indexes:

- 83 overlapping titles between Caixin 289-item index and Aisixiang 457-item index;
- overlap rate: 28.7% of Caixin, 18.2% of Aisixiang;
- overlap is concentrated in 2011-2012, around 45-47%, then declines to single digits by 2016;
- 14 same-article pairs were compared after fetching additional Caixin records.

For the 14 paired same-article comparisons, keyword density is nearly identical:

| Metric | Caixin | Aisixiang |
|---|---:|---:|
| institutional density | 108.12/10k | 105.05/10k |
| diagnostic density | 23.94/10k | 25.46/10k |
| diagnostic share | 18.1% | 19.5% |

Aisixiang versions are about 25% longer on average, but the density profile remains close.

Interpretation:

- the cross-platform divergence is probably not caused by the same article being edited into a different ideological profile;
- the stronger mechanism is content selection: which articles appear on which platform;
- the hypothesis that Xu Ben uses different publishing strategies across platforms is now methodologically important.

## Finding 6: Book-Article Overlap Supports A Publication-Pipeline Model

the project's book-article overlap check compared four book tables of contents against the Caixin and Aisixiang title indexes:

| Book | Type | Chapters checked | Matches | Match rate |
|---|---|---:|---:|---:|
| 《颓废与沉默：透视犬儒文化》 | essay collection | 80 | 37 | 46.3% |
| 《人以什么理由来记忆》 | essay collection | 17 | 11 | 64.7% |
| 《政治是每个人的副业》 | essay collection | 9 | 8 | 88.9% |
| 《明亮的对话：公共说理十八讲》 | systematic monograph | 19 | 1 | 5.3% |

Across the four checked books:

- 125 chapters were checked;
- 57 matched platform article titles;
- 39 matches are `article_before_book_recent`;
- 3 matches are `article_before_book`;
- 3 matches are `book_excerpt_or_adaptation`;
- 12 are `uncertain`;
- platform distribution among matched chapters: Caixin-only 32, Aisixiang-only 12, both 13.

Interpretation:

- For the checked essay collections, many chapters appear to have circulated first or near-simultaneously as platform articles.
- The relation is not simply "book -> excerpt to platform"; for these collections, the stronger pattern is "platform article / same-project article -> accumulated into book."
- Different platforms seem to feed different book projects: Caixin is especially important for 《颓废与沉默》, while Aisixiang is more important for earlier memory/public-political material.
- 《明亮的对话》 is an important negative case: a systematic monograph has very low title overlap, so not all Xu Ben books should be treated as article collections.

Methodological implication:

- The unit of analysis should often be the publication pipeline, not the isolated article.
- Article-level keyword counts may double-count book-draft fragments as independent observations.
- `book_relation_type` is still heuristic and must be checked through close reading before making strong claims about direction.

## Finding 7: Different Books Have Different Circulation Directions

The readable overlap table (`xuben-book-article-overlap.md`) shows that "article-book relation" is not one mechanism.

### 《颓废与沉默》: Caixin As Draft Workspace

The timeline is unusually clear:

- 2013: 3 matched articles, `year_diff=2`, `article_before_book`;
- 2014: 30+ matched articles, `year_diff=1`, mostly `article_before_book_recent`;
- 2015: still publishing matched pieces in the book year;
- platform pattern: overwhelmingly Caixin blog, with some Caixin+Aisixiang overlap.

Working interpretation:

> 《颓废与沉默》 appears to have developed through a Caixin-centered writing pipeline: early trial pieces in 2013, intensive platform writing in 2014, then 2015 book consolidation.

This makes Caixin blog look less like a neutral archive and more like a working surface for the book project.

### 《人以什么理由来记忆》: Shared Prior Source (verified)

The pattern is different, and has been verified:

- 9 matched chapters were initially `uncertain` because Aisixiang records lack dates;
- all 9 are Aisixiang-only;
- cc verified that all show the same "更新时间": `2008-07-16 14:2x` — same day, same hour, clearly a **batch import**;
- article IDs (6621–12195) are far lower than normal 2008 articles, indicating the content predates the import;
- 2008-07-16 accounts for 12 records in the Aisixiang pilot — a confirmed batch-import signal, not a natural publication date;
- 2 additional matched pieces are `book_excerpt_or_adaptation`, appearing in 2011 and 2016 after the 2008 book.

Working interpretation (updated):

> These articles are most likely older academic essays (2000s) that were batch-uploaded to Aisixiang around the time of the 2008 book's publication. The correct `book_relation_type` is `shared_prior_source`: both the Aisixiang record and the book chapter derive from earlier academic writing. Aisixiang is not the writing workspace for this book; it is an archive that received pre-existing material.

This means the same "book-article overlap" statistic implies different publication histories depending on platform and date evidence:

- 《颓废与沉默》: platform articles → book (Caixin as workspace)
- 《人以什么理由来记忆》: earlier academic essays → both book and Aisixiang (shared prior source)

### Relation Types Now Observed

The data currently includes:

- `article_before_book`: 3 cases;
- `article_before_book_recent`: 39 cases;
- `book_excerpt_or_adaptation`: 3 cases;
- `shared_prior_source`: 9 cases (upgraded from `uncertain` after batch-import verification);
- `uncertain`: 3 remaining cases.

The type `same_project_rewrite` remains methodologically important but not yet detected. Many `article_before_book_recent` cases may belong to this or `shared_prior_source` subtypes; they require close reading.

## Finding 8: Close Reading Produces A Four-Type Text-Function Model

Multi-round AI-assisted close reading (6 articles, 4 reviewers: Gemini, GPT, coco, Grok) exposed and corrected several incorrect interpretations and converged on a four-type model of how "制度" functions across Xu Ben's writing. This replaces the earlier institutional/diagnostic binary.

### The four types

| Type | What 制度 does | Representative slot |
|---|---|---|
| `institution_building` | 制度 is a constructive framework: how to build good public life, civic education, democratic governance | Slot 1 (2012), Slot 4 (2016) |
| `institution_failure_analysis` | 制度 is a domination/control mechanism: how propaganda, totalitarianism, and authoritarian education work | Slot 5 (2010), part of Slot 2 (2016) |
| `public_intellectual_failure_diagnosis` | 制度 is background; focus shifts to why intellectuals fail, go silent, or become cynical | Slot 2 (2016) |
| `humanistic_meaning_diagnosis` | 制度 largely exits the frame; AI, human subjectivity, meaning, civilisation become the vocabulary | Slot 3 (2025), Slot 6 (2025) |

### Key close-reading results

1. **Slot 2 density difference is dilution, not deletion.** Aisixiang version is 32% longer with 7.5% more absolute institutional words. The lower density comes from additional non-institutional content, not political censorship. Gemini was corrected twice on this; Grok self-corrected after re-reading.

2. **2016 texts prove institutional analysis persists.** Slot 4 (专制教育, inst=242/10k) shows Xu Ben can still produce high-density institutional-structural analysis in 2016. The "diagnostic turn" is not a sudden abandonment of institutional vocabulary.

3. **2025 texts show genuine frame shift.** Slots 3 and 6 both have制度 nearly absent. The shift is real, but it occurs in a specific media/magazine context (《书城》, public humanities essay) and should not be generalized to all Xu Ben writing.

4. **Genre matters as much as period.** At least 3 of 6 slots are interview-format texts. Interviews compress theoretical concepts into public diagnosis; this is a genre effect, not necessarily an intellectual trajectory.

5. **Most articles are components of book-publishing pipelines.** Slot 1 directly mentions an upcoming book; Slot 4 extends 《统治与教育》. Only Slot 3 reads as fully independent.

### What this means for the analysis framework

The old framework asked: "Is institutional vocabulary declining?"

The new framework asks: "What function does institutional vocabulary perform in this text, and where does this text sit in the article-book lifecycle?"

This is a stronger model because it can explain why:
- The same author can produce high-institutional and high-diagnostic texts in the same year (2016: Slot 2 + Slot 4);
- The "decline" visible in Aisixiang aggregates is partly a composition effect (more 2021-2026 articles are humanistic-meaning type);
- The Caixin/Aisixiang divergence reflects different pipeline roles, not different editorial policies.

## What We Should Not Claim Yet

Do not claim:

- "Xu Ben abandoned institutional analysis."
- "Aisixiang proves a general author-level transformation."
- "Censorship caused the diagnostic turn."
- "Chinese public intellectuals as a group shifted in the same way."
- "Media agenda explains the shift" before the agenda layer has actual records.

Current defensible formulation (updated after close reading):

> Xu Ben's public writing is best modeled as a circulation system in which the same author produces different text types — institution-building, institution-failure analysis, public-intellectual-failure diagnosis, and humanistic-meaning diagnosis — across different platforms, genres, and publication cycles. The visible vocabulary shift is real but cannot be reduced to a single author-level trajectory. Platform selection, genre (interview vs essay vs book review), publication-pipeline position (draft workspace vs archive layer vs book consolidation), and the specific book project active at any given time all shape what vocabulary appears where. The strongest finding is methodological: the same word (制度) performs different functions in different text types, and aggregate keyword trends can mislead if they do not control for text function and lifecycle position.
