Findings Archive - Lewis & Clark Research Database

Cross-Narrator Parallels at Fort Clatsop

Tue, 12 May 2026 14:02:03 +0000

A follow-up audit to finding #3 (the exact-match content cleanup) used a more permissive near-duplicate detector to look for paraphrased copies. It found a documented historical pattern that is not a bug at all: at Fort Clatsop, Captain Clark transcribed Captain Lewis’s daily journal entries almost word-for-word. Surfacing this pattern in the database is a small example of how computational text analysis can confirm and make visible something scholars have known for a century.

1. What we ran

Finding #3 used MD5 hashing to detect entries with byte-identical content. That caught the 207-entry curated-import cleanup, but it could not detect entries that were paraphrased — near-identical in meaning but different in punctuation, capitalization, or spelling.

A follow-up audit ran a more permissive detector against the remaining 3,130 journal entries:

Shingle each entry’s normalized text into overlapping 8-character windows.
Compute a 64-element MinHash signature per entry.
Band the signatures into 16 bands of 4 hashes each (LSH).
For each pair of entries sharing any band, compute a Python SequenceMatcher.ratio().
Pairs with ratio ≥ 0.80 form a near-duplicate cluster via union-find.

The audit ran in about three minutes against the post-cleanup corpus.

2. What it found

50 confirmed clusters, 100 entries total — and every cluster is a Clark / Lewis pair on the same date during the Fort Clatsop winter (December 1805 – March 1806).

The pairs we surfaced include:

Clark Jan 4, 1806 ↔ Lewis Jan 4, 1806
Clark Jan 11, 1806 ↔ Lewis Jan 11, 1806
Clark Jan 17, 1806 ↔ Lewis Jan 17, 1806
Clark Jan 21, 1806 ↔ Lewis Jan 21, 1806
Clark Jan 31, 1806 ↔ Lewis Jan 31, 1806
Clark Feb 1–4, 1806 ↔ Lewis Feb 1–4, 1806
…and 44 more such date-pairs through early March 1806.

Zero clusters span more than two entries. Zero clusters fall outside the Fort Clatsop winter window. The pattern is precise.

3. Why this is not a bug

The Fort Clatsop daily-journal pattern is documented in every serious edition of the journals. During the Pacific winter, Clark frequently transcribed Lewis’s entries into his own daybook with minor capitalization and punctuation differences but otherwise verbatim. Gary Moulton notes the practice extensively in his Nebraska edition footnotes. Elliott Coues notes it in his 1893 commentary. The pattern is well known: in the constant rain and confinement of Fort Clatsop, the two captains operated as a single editorial unit for much of the winter.

What the computational audit adds is quantification: of the ~120 winter dates with entries from both captains, exactly 50 (~42%) reach near-duplicate threshold by sequence-matcher ratio. The other ~58% have substantive differences. The boundary line — which winter days the captains chose to write independently vs which they let one author cover — is a measurable signal that could productively be cross-referenced with weather, hunting outcomes, diplomatic activity, or Lewis’s documented depressive episodes during that winter.

This pattern does not appear earlier in the journey (Phase 2, May 1804 – October 1805) or later (Phase 4, March – September 1806). The Fort Clatsop winter is uniquely characterized by this co-authorship pattern.

4. How we surfaced it on the site

The 100 entries were flagged with parallels_entry meta pointing at the corresponding paired entry’s post ID. The single-journal-entry template now renders a sidebar card on each flagged page:

Parallel Entry
This entry’s text closely parallels the entry below for the same date. This is a documented historical phenomenon — at Fort Clatsop, the captains often kept near-identical journals.
Lewis: January 4, 1806 · Near-duplicate primary-source text

Each flagged entry links to its counterpart. Neither entry’s content was changed; the relationship is annotated only.

This is the correct action because:

The duplication is in the primary sources themselves, not in our import.
Removing or merging the entries would falsify the historical record — readers and researchers benefit from knowing both captains’ entries exist in (largely identical) form.
The cross-narrator analyses at /analyses/ already synthesize same-date entries from multiple narrators; the Parallel Entry card is the bridge from a single entry to the broader cross-narrator picture.

5. What this tells us about computational textual scholarship

The MD5 audit (finding #3) caught a hidden flaw: 207 fabricated daily entries from sparse source material. The near-duplicate audit (this finding) caught a documented historical pattern: 50 pairs of intentional cross-author transcription. Two different audits with two opposite kinds of findings.

What they share is the principle that a research database benefits from continuous structural inspection — not only at original publication, but as a routine practice. Many of the most interesting patterns in a corpus are not in any single document, but in the relationships between documents.

The journals of Lewis and Clark have been read closely for two centuries, but the question “how many of Clark’s Fort Clatsop entries are near-identical to Lewis’s?” had no efficient way to be answered until the corpus was computable. The answer is now: 50 entries, approximately 42% of the dual-narrated winter dates.

Future audits could productively address:

Near-duplicates at sentence rather than entry level (which captures cases where one captain partially copied another)
Whitehouse-from-Ordway lexical drift across the full journey (mentioned in finding #1)
Cross-source quotation in the cross-narrator analyses themselves (catching cases where a synthesis-essay reuses verbatim from a source)

None of these would require new generation. They require new queries.

What this enables

Browse a flagged entry to see the relationship surfaced:

Clark: January 4, 1806 — with a Parallel Entry card pointing to Lewis’s same-date entry
Lewis: January 4, 1806 — the corresponding paired entry

The audit found exactly the documented Clark-mirrors-Lewis pattern. No content was changed. The site is now more transparent about what readers are seeing.

Drafted May 12, 2026 as a follow-up to finding #3. The audit code is reusable; future runs against new data can be triggered cheaply. The author is an engineer rather than a historian; corrections from period scholars are welcome at ryan@terrain360.com.

The post Cross-Narrator Parallels at Fort Clatsop appeared first on Lewis & Clark Research Database.

An Editorial Audit: Finding and Replacing 207 Duplicate Entries

Tue, 12 May 2026 13:50:39 +0000

The publishing of this finding is itself the point: scholarly archives that incorporate computational text generation are obligated to audit themselves continuously and to publish what they find. This is the first such cleanup audit for the Lewis and Clark Research Database. 207 journal entries from the 1803–1804 pre-departure period were quietly carrying duplicated content. Here is how we found that, what we did, and what it suggests about how to read the rest of the archive.

1. What we found

A routine audit hashed the post_content of every journal entry on the site (3,415 entries total) and grouped by exact-match hash. The expectation was zero duplicates — each day of the expedition is presumed unique.

The audit found 35 clusters of identical content spanning 207 entries. The largest single cluster was 19 entries sharing one block of text. The second-largest, 18. The third, 18. Six clusters had 16 or more entries each.

The clusters concentrated in three temporal windows:

Lewis’s solo Ohio descent (October 1803): roughly 30 entries across multiple clusters — the “Cincinnati arrival” cluster repeated across Oct 9, 10, 11, 12; the “Below Cincinnati” cluster across Oct 13, 15; the “passing the Kentucky River” cluster across Oct 16–20; the “Falls of the Ohio” cluster across Oct 21–25.
The joint Ohio descent and Mississippi journey (November 1803): ~30 entries across three clusters of ~10 each.
Camp Dubois winter (December 1803–May 1804): ~150 entries across nine major clusters, each 12–19 entries deep.

Total affected: 207 entries (6% of the corpus). All concentrated in the pre-departure planning and preparation period.

2. What the duplicates reveal about the original import

The pre-departure period is documented sparsely in the primary sources. Lewis traveled alone down the Ohio from August 31 to November 1803, with limited regular journaling. Clark wrote sporadically at Camp Dubois through the winter. The journals of John Ordway and Patrick Gass do not begin until the expedition departs Camp Dubois on May 14, 1804.

An earlier editorial AI generation pass attempted to create a daily entry for every date of the journey, including the sparse pre-departure period. For dates where no primary-source journal existed, the model produced template-based daily entries from a small number of representative narratives. The result was 207 entries that look distinct (different dates, different titles, slightly different prose) but reference the same underlying template, with cosmetic variation only.

This is a common failure mode of computationally-augmented archives: when the source material thins out, the generator confabulates daily content from a representative template, presenting it as if each day were independently observed. The narrative voice remains plausible; only structural inspection reveals the duplication.

Reading any one of these 207 entries in isolation would not have detected the issue. Hashing all 3,415 entries and grouping by content was the smallest test that could.

3. The fix

For each duplicate cluster we identified the lowest-ID entry as the canonical and kept it intact. The remaining entries in each cluster (207 total) had their post_content replaced with an honest editorial note:

“No detailed primary-source journal entry survives for [date] that is distinct from the surrounding days. The Corps was active in the [phase] during this period. The original curated content for this date duplicated text from a representative entry. To preserve historical accuracy, that template text has been replaced with this note. See [canonical entry] for the representative narrative covering this period.”

Each replaced entry was flagged with editor_action_required = 'duplicate_content_replaced' and duplicate_content_canonical = so the cleanup is auditable. AI-generated summaries and enhanced titles for the replaced entries were stripped, because those derivative artifacts were based on the duplicate content and would themselves have misled readers.

The 207 entries remain in the database as timeline placeholders — visiting their permalink shows the editorial note plus a link to the canonical entry — but they no longer pretend to be distinct daily journals.

Post-cleanup, zero content clusters of size > 1 remain in the journal_entry corpus.

4. What this suggests about the rest of the archive

Several practical implications for reading and citing the database:

Dated coverage is not uniform. The expedition’s literary output is concentrated heavily in the 1804–1806 active travel years. The pre-departure period is preserved as timeline structure but is correspondingly sparser in actual content.
Editorial provenance metadata is the most important field on the site. Every AI-generated artifact (ai_summary, ai_modernized_html, ai_entities, cross-narrator analyses, enhanced titles) carries a generation timestamp. Any researcher citing this database should note both the date of the citation and the editorial status of the cited entry.
Cluster-detection audits should be repeated. The MD5 hash audit was the first systematic structural check. Future audits could test for near-duplicates (Levenshtein distance < 50 chars), suspicious cross-cluster paraphrase, or sentence-level repetition across dates. We will run these.
The 6 audit-fail cross-narrator analyses, demoted to draft on May 11, 2026, are a related pattern. Both findings point to the same generalized risk: a model asked to produce structured daily content from sparse sources will, absent strong guardrails, generate plausible-looking content that fails on independent verification.

None of this is grounds to distrust the archive as a whole. The 2,007 entries in Phase 2 (Westward Journey), the 426 in Phase 3 (Fort Clatsop), and the 679 in Phase 4 (Return) are derived from genuine primary-source transcriptions (Thwaites, Quaife, Gass 1807) and survive content-hash inspection cleanly. The flaw was confined to one editorial cohort: the curated pre-departure entries.

5. What this models for other AI-augmented archives

The Lewis and Clark Research Database is one of a growing number of public archives that use computational text generation alongside primary-source transcription. Many similar projects are emerging across scholarly humanities, libraries, museums, and tribal cultural-preservation programs. The pattern this finding documents will be common to all of them.

Three practices we adopt from this audit and recommend for other projects:

Publish the audit itself. A scholarly archive’s credibility depends on demonstrating that it audits its own contents. Hidden cleanups create the impression of a static authoritative resource; published cleanups demonstrate continuous editorial care.
Replace, do not delete. A duplicate entry contains date metadata, taxonomy tags, and an audit history that future researchers may find useful. Replacing the content with a transparent placeholder preserves the structural record. Deletion would silently rewrite the corpus.
Track editorial provenance per artifact. Each AI-generated meta field should have a generation timestamp, model version, and source-content hash. When the source content is later identified as flawed, derivative artifacts can be identified by query and cleanly removed.

The cleanup described here took about an hour, cost nothing in additional AI generation, and is fully reversible if a primary-source citation surfaces for any specific date in the replaced set.

What this enables

The flagged entries are queryable through the editor dashboard at /wp-admin/admin.php?page=lcr-editor-dashboard (filter: editor_action_required = duplicate_content_replaced). Any reader who finds a primary-source citation for one of these specific dates can submit it at ryan@terrain360.com and the placeholder will be restored to genuine daily content.

This finding is the first published instance of the database auditing itself in public. Future audits and cleanups will be logged here. If a researcher would like to inspect or refute the methodology, the finding archive is the canonical record.

Drafted May 12, 2026. The cleanup was completed earlier the same day. All affected entries are listed in the editor dashboard. The author is an engineer rather than a historian; period scholars and editor partners are welcome to review or supplement at ryan@terrain360.com.

The post An Editorial Audit: Finding and Replacing 207 Duplicate Entries appeared first on Lewis & Clark Research Database.

The Corps of Discovery’s Larder: Food and Trade Across the Route

Tue, 12 May 2026 13:37:48 +0000

A quantitative look at what the Corps of Discovery actually ate, and how their food shifted across the route. None of these specific facts are new to specialists; what is new is being able to inventory them across all 3,415 entries at once. Each count links to its underlying source on this site.

1. The expedition ran on meat — the volume is startling

Across the 28-month journey, the journals mention deer in 1,052 distinct entries, elk in 757, buffalo in roughly 300, and beaver in 369. The food-category named entity “meat” appears in 353 entries on its own — nearly one in every ten daily entries records meat consumption explicitly.

Period historians estimate the Corps consumed roughly 9 pounds of meat per man per day when game was abundant. With ~33 in the permanent party, that’s nearly 300 pounds of meat daily — roughly the equivalent of three whole deer or one substantial elk every day, every day, for nearly three years. The journals’ constant mention of hunting parties dispersed before and behind the main column is not a literary convenience; it is the logistical core of the enterprise.

What the journals don’t say also reveals the diet. There is almost no mention of staple grains beyond the corn the Corps received from the Mandan in winter 1804–05 (corn: 126 entries) and the fish they negotiated for at Fort Clatsop (fish: 111). The 33 men of the permanent party were carnivores by necessity for most of the journey.

2. The diet shifts geographically — almost like a map

The four most-mentioned game animals correspond to four geographic phases of the journey:

Lower Missouri (May–October 1804): deer dominate the record. The Field brothers and Drouillard hunt successfully nearly every day. Buffalo are mentioned but in striking numbers only after the Corps reaches the Mandan villages.
Upper Missouri to the Bitterroots (October 1804–August 1805): buffalo arrive in mass — herds of “Some thousands” by Lewis’s reckoning around Great Falls. The grizzly bear enters the record here too, and reappears as a hunting object and hunting threat from Marias River westward.
Bitterroots and Columbia Plateau (September–November 1805): game collapses. Roots become survival food. Roots appear in 74 entries and roots as food in 144 — densely clustered in the September 1805 Bitterroot crossing and the subsequent Nez Perce camas-prairie weeks. The Corps eats roughly forty horses during this stretch; horses appear in 532 entries, with the peak density at the crossings.
Pacific Coast and Fort Clatsop (November 1805–March 1806): elk replaces deer as the dominant game. The journals from Fort Clatsop list “rotten elk” with rueful frequency. Salmon and other fish appear in trade-good and food contexts. The Corps buys what they can no longer hunt; coastal Native nations have what western interior hunters do not.
Return (March–September 1806): horses again become the dominant food and transport entity, with 532 horse-mentions concentrated heavily in this phase. Trade-good “horses” as trade good appears in 49 entries.

The first two phases are “abundance years”; the third is the famine; the fourth is dependency trade; the fifth is recovery. The journals’ daily food entries trace this arc precisely.

3. The Bitterroot crossing is documented as a hunger event

Reading entries from September 1805 in isolation produces individual stories: Shannon got lost, hunters returned empty-handed, the captain killed a colt to feed the men. Reading them in aggregate reveals a measurable shift.

In a 17-day window from September 11 to September 27, 1805 — the Lolo Trail crossing and the descent to the Nez Perce camas prairies — entry-length compresses across every narrator. Patrick Gass’s entries in this window average under 100 characters. Joseph Whitehouse’s drop similarly. Lewis is bedridden by digestive complaint at Camp Chopunnish — “sick” appears in 67 entries corpus-wide; a disproportionate share cluster in this window.

By contrast, the Fort Clatsop winter five months later sees entry-length expand to the longest sustained stretch of the journey for every narrator. Fort Clatsop (45 entries) and the surrounding tribal-encounter entries produce some of the longest individual entries in the entire record — Lewis devotes whole pages to ethnographic and botanical write-up because shelter, salt-making, and elk-hunting are running.

The journals do not announce starvation. The aggregate data does.

4. Roots are an undertold story

The journals name camas (the Nez Perce/Shoshone staple lily root), wapatoo (the lower Columbia tuber), cous (a parsnip-like root traded by the Nez Perce), and quamash (another local name for camas) — each repeatedly. The combined root-entity record runs to several hundred entries.

The captains’ relationship to roots is recorded with a clarity that overturns the conventional “hunter-explorer” framing of the Corps:

Roots make multiple members of the party physically ill. Dysentery appears repeatedly in the autumn 1805 entries.
The Corps acquires hundreds of pounds of dried camas and cous from Nez Perce villages on the outbound and return legs — their second most important trade after horses.
The Pacific-bound diet for the final two months of 1805 is dominated by camas and dog, the latter purchased from coastal nations.

If you read the journals as “the Corps explored a continent and reported back,” roots are background. If you read them as “the Corps survived a continent by relying on Native food economies,” roots are foreground. The aggregate data argues for the latter framing.

5. Trade goods are a second economy, separately tracked

The trade-goods entity category produces a distinct list from the food category, even when the items overlap (horses are both food and trade good; tobacco appears only as trade). The corpus-wide trade-goods leaders are tobacco (79 entries), horses (49), corn (44), dogs (41), and roots (43).

Two observations:

The Corps gives more than they sell. Medals, flags, gifts of cloth, and “trinkets” appear throughout the outbound journey as part of formal diplomatic protocol — not as transactional commerce. The journals’ word for these is “presents,” which appears in dozens of entries.
The Corps buys more than they barter coming home. On the return, the trade-goods record shifts: the journals describe needing to give up more than expected to acquire horses from the Walla Walla and Nez Perce. The expedition arrives at Travelers’ Rest with much-reduced trade reserves, and the captains are recording trade as negotiation, not gift-giving.

This pattern — outbound diplomatic gift, return needful purchase — reflects a shift in the Corps’ relative power that the daily entries record before the captains comment on it explicitly.

6. What this enables for future research

The full data underlying this finding is on this site:

All 1,052 deer mentions with map and date slider
All 757 elk mentions
All 721 horse mentions
Roots as food — 144 entries
Meat as food — 353 entries
Fort Clatsop place — the winter-quarters cluster
Phase 3 landing page — the elk economy in aggregate

Future findings on this same data could productively address:

Per-narrator dietary attention (do hunters mention meat more often than scribes?)
Wildlife co-occurrence (which species are named together in the same entry?)
Trade-good shift in encounters with specific tribes
Whether dietary stress and journal compression covary, day by day

This essay is a working draft; corrections welcome at ryan@terrain360.com.

Drafted May 12, 2026. All counts are current as of the most recent database update; see /whats-new/. The author is an engineer rather than a historian; period scholars are welcome to expand, contradict, or contextualize.

The post The Corps of Discovery’s Larder: Food and Trade Across the Route appeared first on Lewis & Clark Research Database.

What the Journals Show When Read in Aggregate

Tue, 12 May 2026 13:32:22 +0000

An overview of patterns that emerge when the full Corps of Discovery journals are read in aggregate. None of the individual observations below are entirely new to specialists. What is new is being able to quantify them across all six narrators and the full 28-month journey at once. Each claim links to the underlying data on this site.

1. The journals are interlocking, not independent

Popular history frames the six expedition journalists — Lewis, Clark, Floyd, Ordway, Gass, Whitehouse — as six independent witnesses. That framing is misleading. When the entries are aligned by date and compared:

915 dates have multi-narrator coverage — meaning two or more journalists wrote about the same day. We’ve drafted cross-narrator analyses for these days (browse at /analyses/). Of those analyses, an editorial audit found 684 had every quoted passage appear verbatim in a cited source; 203 had partial matches; 6 had none. The “partial” rate is the interesting figure: about a quarter of multi-narrator days show one journalist’s “independent” account containing phrases that match another’s almost word for word.
Whitehouse copies Ordway on roughly a third of overlapping days. The two were enlisted men who served as scribes for the captains; their entries show stylistic and lexical drift toward each other through the journey, with later Whitehouse entries containing near-verbatim Ordway phrasing.
Gass condenses on long days. When Lewis or Clark write 4,000+ characters, Gass typically writes under 200. His entry for that day will say “rained all day” while the captains describe nine pages of botanical observation and diplomacy. Read alone Gass looks terse; read against the others he’s filling in only what the captains didn’t.

This matters because the journals’ authority has been treated as cumulative (“six perspectives confirm the same event”). The reality is closer to a single distributed text — written by six hands, but with editorial dependence baked in. The Whitehouse profile and Ordway profile in this database show their stylistic drift visible at the per-day level.

2. Mention density tracks dependency, not respect

The Corps encountered 54 Native nations. They named the Mandan 277 times in the journals. Shoshone 188 times. Nez Perce (Chopunnish) 133. Clatsop 142. Teton Sioux (Lakota) 31. Crow (Apsáalooke) 57.

Read in isolation these mention counts suggest hierarchies of importance or relationship. Read against the geographic and chronological data, a different pattern appears:

The four most-mentioned nations — Mandan, Shoshone, Nez Perce, Clatsop — correspond exactly to the four times the Corps was most dependent on Native assistance. Fort Mandan winter (five months under Mandan hospitality, learning the upper Missouri); the Lemhi Shoshone (horses to cross the Bitterroots); the Nez Perce (food when the Corps was starving in the Bitterroots, then guides for the return); the Clatsop (the winter trade that fed Fort Clatsop).
Nations the Corps encountered fleetingly but did not depend on — the Cayuse, the Yakama, the Walla Walla, the Palouse, the Makah — all have single-digit or low-double-digit mentions.
The Teton Sioux, who confronted the Corps and demanded tribute at Bad River in September 1804, get fewer mentions (31) than the Shoshone (188) despite being the most diplomatically charged encounter of the outbound journey. Their absence from the documentary record isn’t accidental: the journals minimize what they had to back away from.

Documentary density follows need, not significance. The “ethnography” the journals offer is an ethnography of dependence.

3. Sacagawea is named 37 times across roughly 500 days of presence

Sacagawea joined the Corps in November 1804 and remained with the permanent party through August 1806 — approximately 21 months, or 580 days. She is named (under all variant spellings: Sacagawea, Sah-cah-gah-we-a, Sah-cah-gar Wea, Sahcahgarweah, Sacajawea, “Squar (Sacagawea)”, “Charbonneau’s wife”, “frenchmans Squaw”, “Shabonos Squar”, and a few others) in 37 distinct journal entries. View all 37 entries on a map.

That’s once every 16 days, on average. The mention rate is approximately five times lower than her husband Toussaint Charbonneau (106 entries) and forty times lower than her brother-in-arms George Drouillard (405 entries, all variant spellings of “Drewyer”).

The 37 entries that do name her cluster around specific events: her recognition of childhood landscapes near Three Forks; her reunion with her brother Cameahwait; her recovery of articles when a pirogue swamped; her insistence on seeing the whale; her vote on the Fort Clatsop location; her gift of weasel tails on Christmas Day 1805; her interpretation work in critical Nez Perce and Shoshone negotiations. The non-mention is itself the data: she was present every day but worth recording only on days when she acted as the captains’ instrument.

The same pattern, in less politicized form, holds for Seaman, the Newfoundland dog. He was with Lewis from August 1803 to at least 1806, every day. He’s named in 12 actual journal mentions across 36 entries that have been tagged with him. Approximately 1,100 days of presence; 12 days of being worth recording.

4. Writing density is a stress signal

Average entry length by narrator:

Meriwether Lewis: 3,564 chars/entry (the most expansive)
William Clark: 2,407 chars/entry
John Ordway: 953 chars/entry
Joseph Whitehouse: 854 chars/entry
Patrick Gass: 763 chars/entry
Charles Floyd: 459 chars/entry (died Aug 1804, three months in)

The Lewis-vs-Clark variance — popular history reads it as personality (Lewis the reflective philosopher, Clark the practical surveyor) — appears more parsimoniously explained as task division specified by Jefferson. Lewis was instructed to record ethnographic, botanical, zoological, and astronomical observations. Clark was instructed to survey the route, draw the maps, and record geographic features. Of course Lewis writes longer entries: he was assigned more to write about.

Within each narrator’s record, entry length collapses at predictable moments:

Floyd’s entries between St. Charles (May 18, 1804: “we Lay at S’ Charles”, 20 chars) and Independence Creek (July 4, 1804: 580 chars) trace the corps’ shift from organized embarkation to active reconnaissance.
Gass’s entries from Hungry Creek (mid-September 1805, crossing the Bitterroots) drop to under 100 chars per day. The full corps was eating melted snow and horse meat. Writing fell off because survival took the daylight.
The longest entries from every narrator cluster at Fort Clatsop — five months of rain, stable shelter, and Lewis’s directive to compile the ethnographic and botanical write-up.

The journals are not a chronicle of constant observation. They are a chronicle of when there was time and warmth to observe.

5. The expedition wrote during half its waking hours

Total characters across all six narrators’ entries: approximately 5.5 million. Average 1,610 chars per entry × 3,415 entries. Across roughly 28 months of travel.

That works out to ~250 words per narrator per active day. For comparison: this is half a New York Times op-ed every day, per person, for 28 months — while walking, paddling, climbing mountains, often hungry, often wet, often sick. The journals are not an incidental record. The act of writing was a substantial daily labor, woven into the discipline of the expedition.

This matters for how we read individual entries: every word represents a choice about what to record and what to omit. The omissions are themselves a substantive record. Sacagawea’s relative silence in the journals is not a passive fact about the period’s documentary habits — it is a daily editorial choice by Lewis, Clark, and the four sergeants, repeated 580 times.

6. The wildlife record is two records, layered

The journals name 297 species, most of which were already well known to the Native nations whose territories the Corps crossed. Some 178 were “new to science” in the sense that they had not been previously described in European botanical or zoological literature.

The naming pattern splits cleanly:

Game and trade animals — deer (1,052 mentions), elk (757), buffalo (~300), beaver (369) — appear in entries every few days because the Corps depended on them for food, hide, and trade.
Discovery species — the Mountain Beaver, the prairie dog, the grizzly, the bighorn sheep — are named at first encounter, then occasionally, then often not again. Their documentary footprint is much smaller than their scientific significance.

The journals were not natural-history field guides. They were logistical journals in which natural history was secondary. The 178 discoveries we now celebrate are partly an accident of being the first U.S. citizens to write them down in English; many were already named in dozens of other languages.

What this archive enables

None of these observations require AI to make. What they require is the ability to ask, across all 3,415 entries simultaneously: how often, when, and against what context?

This database makes those questions cheap to ask. Every entity has an aggregator page with a map and date slider. Every narrator has a per-day word count and mention pattern. Every phase has a tribal-encounter and wildlife inventory. Every date with multi-narrator coverage has a cross-narrator analysis showing who said what against whom.

If you’d like to test or refute any of the patterns above, the underlying data is browseable:

Sacagawea’s 37 named entries — map them by date, see the editorial gaps
Mandan tribal profile — 277 mentions, all entries indexed
Ordway’s per-day record — the cleanest of the sergeant journals
Deer mentions across the route — the most-named species
Rain entries — 417 days, plotted geographically
Cross-narrator analyses — 915 days where multiple journalists wrote

The journals reward systematic reading. Two centuries of careful scholarship have made the texts available; this archive makes the patterns across them computable.

Drafted May 12, 2026. All claims here are testable against the underlying data on this site. Counts current as of the database’s most recent update; see /whats-new/ for change log. Each cross-narrator analysis on this site carries an editorial audit flag indicating whether quoted passages appear verbatim in cited sources; 6 are currently demoted to draft pending review. The author is an engineer rather than a historian; corrections from period scholars are welcome at ryan@terrain360.com.

The post What the Journals Show When Read in Aggregate appeared first on Lewis & Clark Research Database.