<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Findings Archive - Lewis &amp; Clark Research Database</title>
	<atom:link href="https://lewisandclarkresearch.org/findings/feed/" rel="self" type="application/rss+xml" />
	<link>https://lewisandclarkresearch.org/findings/</link>
	<description>A digital archive of treaties, documents, artwork, and 360° trail panoramas from the Corps of Discovery</description>
	<lastBuildDate>Tue, 12 May 2026 14:02:03 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Cross-Narrator Parallels at Fort Clatsop</title>
		<link>https://lewisandclarkresearch.org/findings/cross-narrator-parallels-at-fort-clatsop/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 12 May 2026 14:02:03 +0000</pubDate>
				<guid isPermaLink="false">https://lewisandclarkresearch.org/findings/cross-narrator-parallels-at-fort-clatsop/</guid>

					<description><![CDATA[<p>A near-duplicate audit (MinHash LSH + SequenceMatcher) found 50 confirmed clusters of paraphrased journal text, all of them Clark/Lewis pairs on the same date during the Fort Clatsop winter. This is a documented historical phenomenon — at Fort Clatsop the captains often kept near-identical journals. The 50 pairs were flagged with parallels_entry meta and surfaced as a cross-reference card on each single-entry page. Computational text analysis is being used here to confirm and make visible something scholars have known for a century.</p>
<p>The post <a href="https://lewisandclarkresearch.org/findings/cross-narrator-parallels-at-fort-clatsop/">Cross-Narrator Parallels at Fort Clatsop</a> appeared first on <a href="https://lewisandclarkresearch.org">Lewis &amp; Clark Research Database</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="findings-intro">
<p><em>A follow-up audit to <a href="/findings/editorial-audit-duplicate-entries/">finding #3</a> (the exact-match content cleanup) used a more permissive near-duplicate detector to look for paraphrased copies. It found a documented historical pattern that is not a bug at all: at Fort Clatsop, Captain Clark transcribed Captain Lewis&#8217;s daily journal entries almost word-for-word. Surfacing this pattern in the database is a small example of how computational text analysis can confirm and make visible something scholars have known for a century.</em></p>
</div>
<section class="findings-section">
<h2>1. What we ran</h2>
<p>Finding #3 used MD5 hashing to detect entries with byte-identical content. That caught the 207-entry curated-import cleanup, but it could not detect entries that were paraphrased &mdash; near-identical in meaning but different in punctuation, capitalization, or spelling.</p>
<p>A follow-up audit ran a more permissive detector against the remaining 3,130 journal entries:</p>
<ul>
<li>Shingle each entry&#8217;s normalized text into overlapping 8-character windows.</li>
<li>Compute a 64-element MinHash signature per entry.</li>
<li>Band the signatures into 16 bands of 4 hashes each (LSH).</li>
<li>For each pair of entries sharing any band, compute a Python <code>SequenceMatcher.ratio()</code>.</li>
<li>Pairs with ratio &geq; 0.80 form a near-duplicate cluster via union-find.</li>
</ul>
<p>The audit ran in about three minutes against the post-cleanup corpus.</p>
</section>
<section class="findings-section">
<h2>2. What it found</h2>
<p><strong>50 confirmed clusters, 100 entries total &mdash; and every cluster is a Clark / Lewis pair on the same date during the Fort Clatsop winter (December 1805 – March 1806).</strong></p>
<p>The pairs we surfaced include:</p>
<ul>
<li>Clark Jan 4, 1806 &harr; Lewis Jan 4, 1806</li>
<li>Clark Jan 11, 1806 &harr; Lewis Jan 11, 1806</li>
<li>Clark Jan 17, 1806 &harr; Lewis Jan 17, 1806</li>
<li>Clark Jan 21, 1806 &harr; Lewis Jan 21, 1806</li>
<li>Clark Jan 31, 1806 &harr; Lewis Jan 31, 1806</li>
<li>Clark Feb 1–4, 1806 &harr; Lewis Feb 1–4, 1806</li>
<li>&#8230;and 44 more such date-pairs through early March 1806.</li>
</ul>
<p>Zero clusters span more than two entries. Zero clusters fall outside the Fort Clatsop winter window. The pattern is precise.</p>
</section>
<section class="findings-section">
<h2>3. Why this is not a bug</h2>
<p>The Fort Clatsop daily-journal pattern is documented in <em>every</em> serious edition of the journals. During the Pacific winter, Clark frequently transcribed Lewis&#8217;s entries into his own daybook with minor capitalization and punctuation differences but otherwise verbatim. Gary Moulton notes the practice extensively in his Nebraska edition footnotes. Elliott Coues notes it in his 1893 commentary. The pattern is well known: in the constant rain and confinement of Fort Clatsop, the two captains operated as a single editorial unit for much of the winter.</p>
<p>What the computational audit adds is <strong>quantification</strong>: of the ~120 winter dates with entries from both captains, exactly 50 (~42%) reach near-duplicate threshold by sequence-matcher ratio. The other ~58% have substantive differences. The boundary line &mdash; which winter days the captains chose to write independently vs which they let one author cover &mdash; is a measurable signal that could productively be cross-referenced with weather, hunting outcomes, diplomatic activity, or Lewis&#8217;s documented depressive episodes during that winter.</p>
<p>This pattern does not appear earlier in the journey (Phase 2, May 1804 &ndash; October 1805) or later (Phase 4, March &ndash; September 1806). The Fort Clatsop winter is uniquely characterized by this co-authorship pattern.</p>
</section>
<section class="findings-section">
<h2>4. How we surfaced it on the site</h2>
<p>The 100 entries were flagged with <code>parallels_entry</code> meta pointing at the corresponding paired entry&#8217;s post ID. The single-journal-entry template now renders a sidebar card on each flagged page:</p>
<blockquote style="background:#fff5e6;border:1px solid #d4ba6a;padding:14px 18px;margin:16px 0;border-radius:4px;font-size:0.92rem;"><p>
<strong style="color:#8B6914;">Parallel Entry</strong><br />
<em style="color:#5a3a20;">This entry&#8217;s text closely parallels the entry below for the same date. This is a documented historical phenomenon &mdash; at Fort Clatsop, the captains often kept near-identical journals.</em><br />
<strong>Lewis: January 4, 1806</strong> &middot; Near-duplicate primary-source text
</p></blockquote>
<p>Each flagged entry links to its counterpart. Neither entry&#8217;s content was changed; the relationship is annotated only.</p>
<p>This is the correct action because:</p>
<ol>
<li>The duplication is in the primary sources themselves, not in our import.</li>
<li>Removing or merging the entries would falsify the historical record &mdash; readers and researchers benefit from knowing both captains&#8217; entries exist in (largely identical) form.</li>
<li>The cross-narrator analyses at <a href="/analyses/">/analyses/</a> already synthesize same-date entries from multiple narrators; the Parallel Entry card is the bridge from a single entry to the broader cross-narrator picture.</li>
</ol>
</section>
<section class="findings-section">
<h2>5. What this tells us about computational textual scholarship</h2>
<p>The MD5 audit (finding #3) caught a hidden flaw: 207 fabricated daily entries from sparse source material. The near-duplicate audit (this finding) caught a documented historical pattern: 50 pairs of intentional cross-author transcription. Two different audits with two opposite kinds of findings.</p>
<p>What they share is the principle that a research database benefits from continuous structural inspection &mdash; not only at original publication, but as a routine practice. Many of the most interesting patterns in a corpus are not in any single document, but in the relationships <em>between</em> documents.</p>
<p>The journals of Lewis and Clark have been read closely for two centuries, but the question &#8220;how many of Clark&#8217;s Fort Clatsop entries are near-identical to Lewis&#8217;s?&#8221; had no efficient way to be answered until the corpus was computable. The answer is now: <strong>50 entries, approximately 42% of the dual-narrated winter dates.</strong></p>
<p>Future audits could productively address:</p>
<ul>
<li>Near-duplicates at sentence rather than entry level (which captures cases where one captain partially copied another)</li>
<li>Whitehouse-from-Ordway lexical drift across the full journey (mentioned in <a href="/findings/what-the-journals-show-when-read-in-aggregate/">finding #1</a>)</li>
<li>Cross-source quotation in the cross-narrator analyses themselves (catching cases where a synthesis-essay reuses verbatim from a source)</li>
</ul>
<p>None of these would require new generation. They require new queries.</p>
</section>
<section class="findings-section">
<h2>What this enables</h2>
<p>Browse a flagged entry to see the relationship surfaced:</p>
<ul>
<li><a href="/journal/clark-january-4-1806/">Clark: January 4, 1806</a> &mdash; with a Parallel Entry card pointing to Lewis&#8217;s same-date entry</li>
<li><a href="/journal/lewis-january-4-1806/">Lewis: January 4, 1806</a> &mdash; the corresponding paired entry</li>
</ul>
<p>The audit found exactly the documented Clark-mirrors-Lewis pattern. No content was changed. The site is now more transparent about what readers are seeing.</p>
<hr>
<p style="font-size:0.85rem;color:#777;"><em>Drafted May 12, 2026 as a follow-up to <a href="/findings/editorial-audit-duplicate-entries/">finding #3</a>. The audit code is reusable; future runs against new data can be triggered cheaply. The author is an engineer rather than a historian; corrections from period scholars are welcome at ryan@terrain360.com.</em></p>
<p>The post <a href="https://lewisandclarkresearch.org/findings/cross-narrator-parallels-at-fort-clatsop/">Cross-Narrator Parallels at Fort Clatsop</a> appeared first on <a href="https://lewisandclarkresearch.org">Lewis &amp; Clark Research Database</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>An Editorial Audit: Finding and Replacing 207 Duplicate Entries</title>
		<link>https://lewisandclarkresearch.org/findings/editorial-audit-duplicate-entries/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 12 May 2026 13:50:39 +0000</pubDate>
				<guid isPermaLink="false">https://lewisandclarkresearch.org/findings/editorial-audit-duplicate-entries/</guid>

					<description><![CDATA[<p>A routine MD5 audit of the journal corpus found 207 entries (6%) carrying duplicated content, concentrated in the pre-departure 1803-1804 window where primary sources are sparse. An earlier editorial generation pass had created template-based daily entries from a small number of representative narratives. This finding documents what was found, how it was fixed (replace with honest editorial notes, preserve dates as timeline placeholders), and what it suggests about how to read the rest of the archive and how to operate AI-augmented archives generally.</p>
<p>The post <a href="https://lewisandclarkresearch.org/findings/editorial-audit-duplicate-entries/">An Editorial Audit: Finding and Replacing 207 Duplicate Entries</a> appeared first on <a href="https://lewisandclarkresearch.org">Lewis &amp; Clark Research Database</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="findings-intro">
<p><em>The publishing of this finding is itself the point: scholarly archives that incorporate computational text generation are obligated to audit themselves continuously and to publish what they find. This is the first such cleanup audit for the Lewis and Clark Research Database. 207 journal entries from the 1803&ndash;1804 pre-departure period were quietly carrying duplicated content. Here is how we found that, what we did, and what it suggests about how to read the rest of the archive.</em></p>
</div>
<section class="findings-section">
<h2>1. What we found</h2>
<p>A routine audit hashed the <code>post_content</code> of every journal entry on the site (3,415 entries total) and grouped by exact-match hash. The expectation was zero duplicates &mdash; each day of the expedition is presumed unique.</p>
<p>The audit found <strong>35 clusters of identical content</strong> spanning <strong>207 entries</strong>. The largest single cluster was 19 entries sharing one block of text. The second-largest, 18. The third, 18. Six clusters had 16 or more entries each.</p>
<p>The clusters concentrated in three temporal windows:</p>
<ul>
<li><strong>Lewis&#8217;s solo Ohio descent (October 1803):</strong> roughly 30 entries across multiple clusters &mdash; the &#8220;Cincinnati arrival&#8221; cluster repeated across Oct 9, 10, 11, 12; the &#8220;Below Cincinnati&#8221; cluster across Oct 13, 15; the &#8220;passing the Kentucky River&#8221; cluster across Oct 16&ndash;20; the &#8220;Falls of the Ohio&#8221; cluster across Oct 21&ndash;25.</li>
<li><strong>The joint Ohio descent and Mississippi journey (November 1803):</strong> ~30 entries across three clusters of ~10 each.</li>
<li><strong>Camp Dubois winter (December 1803&ndash;May 1804):</strong> ~150 entries across nine major clusters, each 12&ndash;19 entries deep.</li>
</ul>
<p>Total affected: 207 entries (6% of the corpus). All concentrated in the pre-departure planning and preparation period.</p>
</section>
<section class="findings-section">
<h2>2. What the duplicates reveal about the original import</h2>
<p>The pre-departure period is documented sparsely in the primary sources. Lewis traveled alone down the Ohio from August 31 to November 1803, with limited regular journaling. Clark wrote sporadically at Camp Dubois through the winter. The journals of John Ordway and Patrick Gass do not begin until the expedition departs Camp Dubois on May 14, 1804.</p>
<p>An earlier editorial AI generation pass attempted to create a daily entry for every date of the journey, including the sparse pre-departure period. For dates where no primary-source journal existed, the model produced template-based daily entries from a small number of representative narratives. The result was 207 entries that <em>look</em> distinct (different dates, different titles, slightly different prose) but reference the same underlying template, with cosmetic variation only.</p>
<p>This is a common failure mode of computationally-augmented archives: when the source material thins out, the generator confabulates daily content from a representative template, presenting it as if each day were independently observed. The narrative voice remains plausible; only structural inspection reveals the duplication.</p>
<p>Reading any one of these 207 entries in isolation would not have detected the issue. Hashing all 3,415 entries and grouping by content was the smallest test that could.</p>
</section>
<section class="findings-section">
<h2>3. The fix</h2>
<p>For each duplicate cluster we identified the lowest-ID entry as the canonical and kept it intact. The remaining entries in each cluster (207 total) had their post_content replaced with an honest editorial note:</p>
<blockquote style="background:#fef9ec;border-left:4px solid #b87333;padding:12px 16px;margin:16px 0;font-style:italic;font-size:0.95rem;"><p>
&#8220;No detailed primary-source journal entry survives for [date] that is distinct from the surrounding days. The Corps was active in the [phase] during this period. The original curated content for this date duplicated text from a representative entry. To preserve historical accuracy, that template text has been replaced with this note. See [canonical entry] for the representative narrative covering this period.&#8221;
</p></blockquote>
<p>Each replaced entry was flagged with <code>editor_action_required = 'duplicate_content_replaced'</code> and <code>duplicate_content_canonical = &lt;id&gt;</code> so the cleanup is auditable. AI-generated summaries and enhanced titles for the replaced entries were stripped, because those derivative artifacts were based on the duplicate content and would themselves have misled readers.</p>
<p>The 207 entries remain in the database as timeline placeholders &mdash; visiting their permalink shows the editorial note plus a link to the canonical entry &mdash; but they no longer pretend to be distinct daily journals.</p>
<p>Post-cleanup, zero content clusters of size &gt; 1 remain in the journal_entry corpus.</p>
</section>
<section class="findings-section">
<h2>4. What this suggests about the rest of the archive</h2>
<p>Several practical implications for reading and citing the database:</p>
<ul>
<li><strong>Dated coverage is not uniform.</strong> The expedition&#8217;s literary output is concentrated heavily in the 1804&ndash;1806 active travel years. The pre-departure period is preserved as timeline structure but is correspondingly sparser in actual content.</li>
<li><strong>Editorial provenance metadata is the most important field on the site.</strong> Every AI-generated artifact (ai_summary, ai_modernized_html, ai_entities, cross-narrator analyses, enhanced titles) carries a generation timestamp. Any researcher citing this database should note both the date of the citation and the editorial status of the cited entry.</li>
<li><strong>Cluster-detection audits should be repeated.</strong> The MD5 hash audit was the first systematic structural check. Future audits could test for near-duplicates (Levenshtein distance &lt; 50 chars), suspicious cross-cluster paraphrase, or sentence-level repetition across dates. We will run these.</li>
<li><strong>The 6 audit-fail cross-narrator analyses, demoted to draft on May 11, 2026, are a related pattern.</strong> Both findings point to the same generalized risk: a model asked to produce structured daily content from sparse sources will, absent strong guardrails, generate plausible-looking content that fails on independent verification.</li>
</ul>
<p>None of this is grounds to distrust the archive as a whole. The 2,007 entries in <a href="/expedition-phase/phase-2-westward-journey-1804-1805/">Phase 2 (Westward Journey)</a>, the 426 in <a href="/expedition-phase/phase-3-winter-at-fort-clatsop-1805-1806/">Phase 3 (Fort Clatsop)</a>, and the 679 in <a href="/expedition-phase/phase-4-return-journey-1806/">Phase 4 (Return)</a> are derived from genuine primary-source transcriptions (Thwaites, Quaife, Gass 1807) and survive content-hash inspection cleanly. The flaw was confined to one editorial cohort: the curated pre-departure entries.</p>
</section>
<section class="findings-section">
<h2>5. What this models for other AI-augmented archives</h2>
<p>The Lewis and Clark Research Database is one of a growing number of public archives that use computational text generation alongside primary-source transcription. Many similar projects are emerging across scholarly humanities, libraries, museums, and tribal cultural-preservation programs. The pattern this finding documents will be common to all of them.</p>
<p>Three practices we adopt from this audit and recommend for other projects:</p>
<ol>
<li><strong>Publish the audit itself.</strong> A scholarly archive&#8217;s credibility depends on demonstrating that it audits its own contents. Hidden cleanups create the impression of a static authoritative resource; published cleanups demonstrate continuous editorial care.</li>
<li><strong>Replace, do not delete.</strong> A duplicate entry contains date metadata, taxonomy tags, and an audit history that future researchers may find useful. Replacing the content with a transparent placeholder preserves the structural record. Deletion would silently rewrite the corpus.</li>
<li><strong>Track editorial provenance per artifact.</strong> Each AI-generated meta field should have a generation timestamp, model version, and source-content hash. When the source content is later identified as flawed, derivative artifacts can be identified by query and cleanly removed.</li>
</ol>
<p>The cleanup described here took about an hour, cost nothing in additional AI generation, and is fully reversible if a primary-source citation surfaces for any specific date in the replaced set.</p>
</section>
<section class="findings-section">
<h2>What this enables</h2>
<p>The flagged entries are queryable through the editor dashboard at <code>/wp-admin/admin.php?page=lcr-editor-dashboard</code> (filter: editor_action_required = duplicate_content_replaced). Any reader who finds a primary-source citation for one of these specific dates can submit it at <a href="mailto:ryan@terrain360.com">ryan@terrain360.com</a> and the placeholder will be restored to genuine daily content.</p>
<p>This finding is the first published instance of the database auditing itself in public. Future audits and cleanups will be logged here. If a researcher would like to inspect or refute the methodology, the <a href="/wp-admin/edit.php?post_type=finding">finding archive</a> is the canonical record.</p>
<hr>
<p style="font-size:0.85rem;color:#777;"><em>Drafted May 12, 2026. The cleanup was completed earlier the same day. All affected entries are listed in the editor dashboard. The author is an engineer rather than a historian; period scholars and editor partners are welcome to review or supplement at ryan@terrain360.com.</em></p>
<p>The post <a href="https://lewisandclarkresearch.org/findings/editorial-audit-duplicate-entries/">An Editorial Audit: Finding and Replacing 207 Duplicate Entries</a> appeared first on <a href="https://lewisandclarkresearch.org">Lewis &amp; Clark Research Database</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>The Corps of Discovery&#8217;s Larder: Food and Trade Across the Route</title>
		<link>https://lewisandclarkresearch.org/findings/corps-of-discovery-larder-by-phase/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 12 May 2026 13:37:48 +0000</pubDate>
				<guid isPermaLink="false">https://lewisandclarkresearch.org/findings/corps-of-discovery-larder-by-phase/</guid>

					<description><![CDATA[<p>A quantitative look at what the Corps of Discovery actually ate and traded across the 28-month journey: meat dominance (1,052 deer + 757 elk + 369 beaver + ~300 buffalo entries), the geographic shift from Missouri-River deer abundance to Pacific-coast salmon dependency, the Bitterroot famine documented through entry-length compression, the underrecorded centrality of roots to the Corps's survival, and the shift from outbound diplomatic gift-giving to return-leg needful trade.</p>
<p>The post <a href="https://lewisandclarkresearch.org/findings/corps-of-discovery-larder-by-phase/">The Corps of Discovery&#8217;s Larder: Food and Trade Across the Route</a> appeared first on <a href="https://lewisandclarkresearch.org">Lewis &amp; Clark Research Database</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="findings-intro">
<p><em>A quantitative look at what the Corps of Discovery actually ate, and how their food shifted across the route. None of these specific facts are new to specialists; what is new is being able to inventory them across all 3,415 entries at once. Each count links to its underlying source on this site.</em></p>
</div>
<section class="findings-section">
<h2>1. The expedition ran on meat &mdash; the volume is startling</h2>
<p>Across the 28-month journey, the journals mention <a href="/entity/animals/deer/">deer</a> in 1,052 distinct entries, <a href="/entity/animals/elk/">elk</a> in 757, <a href="/entity/animals/buffalo/">buffalo</a> in roughly 300, and <a href="/entity/animals/beaver/">beaver</a> in 369. The food-category named entity <a href="/entity/foods/meat/">&#8220;meat&#8221;</a> appears in 353 entries on its own &mdash; nearly one in every ten daily entries records meat consumption explicitly.</p>
<p>Period historians estimate the Corps consumed roughly <strong>9 pounds of meat per man per day</strong> when game was abundant. With ~33 in the permanent party, that&#8217;s nearly 300 pounds of meat daily &mdash; roughly the equivalent of three whole deer or one substantial elk every day, every day, for nearly three years. The journals&#8217; constant mention of hunting parties dispersed before and behind the main column is not a literary convenience; it is the logistical core of the enterprise.</p>
<p>What the journals don&#8217;t say also reveals the diet. There is almost no mention of staple grains beyond the corn the Corps received from the Mandan in winter 1804–05 (<a href="/entity/foods/corn/">corn</a>: 126 entries) and the fish they negotiated for at Fort Clatsop (<a href="/entity/foods/fish/">fish</a>: 111). The 33 men of the permanent party were carnivores by necessity for most of the journey.</p>
</section>
<section class="findings-section">
<h2>2. The diet shifts geographically &mdash; almost like a map</h2>
<p>The four most-mentioned game animals correspond to four geographic phases of the journey:</p>
<ul>
<li><strong>Lower Missouri (May–October 1804):</strong> deer dominate the record. The Field brothers and Drouillard hunt successfully nearly every day. Buffalo are mentioned but in striking numbers only after the Corps reaches the Mandan villages.</li>
<li><strong>Upper Missouri to the Bitterroots (October 1804–August 1805):</strong> buffalo arrive in mass &mdash; herds of &#8220;Some thousands&#8221; by Lewis&#8217;s reckoning around Great Falls. The grizzly bear enters the record here too, and reappears as a hunting object and hunting threat from Marias River westward.</li>
<li><strong>Bitterroots and Columbia Plateau (September–November 1805):</strong> game collapses. Roots become survival food. <a href="/entity/plants/roots/">Roots</a> appear in 74 entries and <a href="/entity/foods/roots/">roots as food</a> in 144 &mdash; densely clustered in the September 1805 Bitterroot crossing and the subsequent Nez Perce camas-prairie weeks. The Corps eats roughly forty horses during this stretch; horses appear in 532 entries, with the peak density at the crossings.</li>
<li><strong>Pacific Coast and Fort Clatsop (November 1805–March 1806):</strong> elk replaces deer as the dominant game. The journals from Fort Clatsop list &#8220;rotten elk&#8221; with rueful frequency. Salmon and other fish appear in trade-good and food contexts. The Corps buys what they can no longer hunt; coastal Native nations have what western interior hunters do not.</li>
<li><strong>Return (March–September 1806):</strong> horses again become the dominant food and transport entity, with 532 horse-mentions concentrated heavily in this phase. Trade-good <a href="/entity/trade_goods/horses/">&#8220;horses&#8221; as trade good</a> appears in 49 entries.</li>
</ul>
<p>The first two phases are &#8220;abundance years&#8221;; the third is the famine; the fourth is dependency trade; the fifth is recovery. The journals&#8217; daily food entries trace this arc precisely.</p>
</section>
<section class="findings-section">
<h2>3. The Bitterroot crossing is documented as a hunger event</h2>
<p>Reading entries from September 1805 in isolation produces individual stories: <a href="/entity/people/Shannon/">Shannon</a> got lost, hunters returned empty-handed, the captain killed a colt to feed the men. Reading them in aggregate reveals a measurable shift.</p>
<p>In a 17-day window from September 11 to September 27, 1805 &mdash; the Lolo Trail crossing and the descent to the Nez Perce camas prairies &mdash; entry-length compresses across every narrator. Patrick Gass&#8217;s entries in this window average under 100 characters. Joseph Whitehouse&#8217;s drop similarly. Lewis is bedridden by digestive complaint at <a href="/entity/places/Camp%20Chopunnish/">Camp Chopunnish</a> &mdash; <a href="/entity/medical/sick/">&#8220;sick&#8221;</a> appears in 67 entries corpus-wide; a disproportionate share cluster in this window.</p>
<p>By contrast, the Fort Clatsop winter five months later sees entry-length expand to the longest sustained stretch of the journey for every narrator. <a href="/entity/places/Fort%20Clatsop/">Fort Clatsop</a> (45 entries) and the surrounding tribal-encounter entries produce some of the longest individual entries in the entire record &mdash; Lewis devotes whole pages to ethnographic and botanical write-up because shelter, salt-making, and elk-hunting are running.</p>
<p>The journals do not announce starvation. The aggregate data does.</p>
</section>
<section class="findings-section">
<h2>4. Roots are an undertold story</h2>
<p>The journals name <a href="/entity/plants/camas/">camas</a> (the Nez Perce/Shoshone staple lily root), <a href="/entity/foods/wapatoo/">wapatoo</a> (the lower Columbia tuber), <a href="/entity/plants/cous/">cous</a> (a parsnip-like root traded by the Nez Perce), and <a href="/entity/plants/quamash/">quamash</a> (another local name for camas) &mdash; each repeatedly. The combined root-entity record runs to several hundred entries.</p>
<p>The captains&#8217; relationship to roots is recorded with a clarity that overturns the conventional &#8220;hunter-explorer&#8221; framing of the Corps:</p>
<ul>
<li>Roots make multiple members of the party physically ill. <a href="/entity/medical/dysentery/">Dysentery</a> appears repeatedly in the autumn 1805 entries.</li>
<li>The Corps acquires hundreds of pounds of dried camas and cous from Nez Perce villages on the outbound and return legs &mdash; their second most important trade after horses.</li>
<li>The Pacific-bound diet for the final two months of 1805 is dominated by camas and dog, the latter purchased from coastal nations.</li>
</ul>
<p>If you read the journals as &#8220;the Corps explored a continent and reported back,&#8221; roots are background. If you read them as &#8220;the Corps survived a continent by relying on Native food economies,&#8221; roots are foreground. The aggregate data argues for the latter framing.</p>
</section>
<section class="findings-section">
<h2>5. Trade goods are a second economy, separately tracked</h2>
<p>The trade-goods entity category produces a distinct list from the food category, even when the items overlap (horses are both food and trade good; tobacco appears only as trade). The corpus-wide trade-goods leaders are <a href="/entity/trade_goods/tobacco/">tobacco</a> (79 entries), <a href="/entity/trade_goods/horses/">horses</a> (49), <a href="/entity/trade_goods/corn/">corn</a> (44), <a href="/entity/trade_goods/dogs/">dogs</a> (41), and <a href="/entity/trade_goods/roots/">roots</a> (43).</p>
<p>Two observations:</p>
<ul>
<li>The Corps gives more than they sell. Medals, flags, gifts of cloth, and &#8220;trinkets&#8221; appear throughout the outbound journey as part of formal diplomatic protocol &mdash; not as transactional commerce. The journals&#8217; word for these is &#8220;presents,&#8221; which appears in dozens of entries.</li>
<li>The Corps buys more than they barter coming home. On the return, the trade-goods record shifts: the journals describe needing to give up more than expected to acquire horses from the Walla Walla and Nez Perce. The expedition arrives at Travelers&#8217; Rest with much-reduced trade reserves, and the captains are recording trade as <em>negotiation</em>, not gift-giving.</li>
</ul>
<p>This pattern &mdash; outbound diplomatic gift, return needful purchase &mdash; reflects a shift in the Corps&#8217; relative power that the daily entries record before the captains comment on it explicitly.</p>
</section>
<section class="findings-section">
<h2>6. What this enables for future research</h2>
<p>The full data underlying this finding is on this site:</p>
<ul>
<li><a href="/entity/animals/deer/">All 1,052 deer mentions</a> with map and date slider</li>
<li><a href="/entity/animals/elk/">All 757 elk mentions</a></li>
<li><a href="/entity/animals/horses/">All 721 horse mentions</a></li>
<li><a href="/entity/foods/roots/">Roots as food</a> &mdash; 144 entries</li>
<li><a href="/entity/foods/meat/">Meat as food</a> &mdash; 353 entries</li>
<li><a href="/entity/places/Fort%20Clatsop/">Fort Clatsop place</a> &mdash; the winter-quarters cluster</li>
<li><a href="/expedition-phase/phase-3-winter-at-fort-clatsop-1805-1806/">Phase 3 landing page</a> &mdash; the elk economy in aggregate</li>
</ul>
<p>Future findings on this same data could productively address:</p>
<ul>
<li>Per-narrator dietary attention (do hunters mention meat more often than scribes?)</li>
<li>Wildlife co-occurrence (which species are named together in the same entry?)</li>
<li>Trade-good shift in encounters with specific tribes</li>
<li>Whether dietary stress and journal compression covary, day by day</li>
</ul>
<p>This essay is a working draft; corrections welcome at <a href="mailto:ryan@terrain360.com">ryan@terrain360.com</a>.</p>
<hr>
<p style="font-size:0.85rem;color:#777;"><em>Drafted May 12, 2026. All counts are current as of the most recent database update; see <a href="/whats-new/">/whats-new/</a>. The author is an engineer rather than a historian; period scholars are welcome to expand, contradict, or contextualize.</em></p>
<p>The post <a href="https://lewisandclarkresearch.org/findings/corps-of-discovery-larder-by-phase/">The Corps of Discovery&#8217;s Larder: Food and Trade Across the Route</a> appeared first on <a href="https://lewisandclarkresearch.org">Lewis &amp; Clark Research Database</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What the Journals Show When Read in Aggregate</title>
		<link>https://lewisandclarkresearch.org/findings/what-the-journals-show-when-read-in-aggregate/</link>
		
		<dc:creator><![CDATA[]]></dc:creator>
		<pubDate>Tue, 12 May 2026 13:32:22 +0000</pubDate>
				<guid isPermaLink="false">https://lewisandclarkresearch.org/findings/what-the-journals-show-when-read-in-aggregate/</guid>

					<description><![CDATA[<p>Six patterns that emerge from reading the full 3,415-entry corpus in aggregate: how the journals are more interlocking than independent; how tribal mention density tracks dependence not respect; how Sacagawea is named in only 37 of her ~580 days of presence; how writing length is a stress signal; how the expedition wrote roughly 5.5 million characters across 28 months; and how the wildlife record is two records layered together. Each claim links to queryable data on this site.</p>
<p>The post <a href="https://lewisandclarkresearch.org/findings/what-the-journals-show-when-read-in-aggregate/">What the Journals Show When Read in Aggregate</a> appeared first on <a href="https://lewisandclarkresearch.org">Lewis &amp; Clark Research Database</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="findings-intro">
<p><em>An overview of patterns that emerge when the full Corps of Discovery journals are read in aggregate. None of the individual observations below are entirely new to specialists. What is new is being able to <strong>quantify</strong> them across all six narrators and the full 28-month journey at once. Each claim links to the underlying data on this site.</em></p>
</div>
<section class="findings-section">
<h2>1. The journals are interlocking, not independent</h2>
<p>Popular history frames the six expedition journalists — Lewis, Clark, Floyd, Ordway, Gass, Whitehouse — as six independent witnesses. That framing is misleading. When the entries are aligned by date and compared:</p>
<ul>
<li><strong>915 dates have multi-narrator coverage</strong> &mdash; meaning two or more journalists wrote about the same day. We&#8217;ve drafted cross-narrator analyses for these days (browse at <a href="/analyses/">/analyses/</a>). Of those analyses, an editorial audit found <strong>684 had every quoted passage appear verbatim in a cited source; 203 had partial matches; 6 had none</strong>. The &#8220;partial&#8221; rate is the interesting figure: about a quarter of multi-narrator days show one journalist&#8217;s &#8220;independent&#8221; account containing phrases that match another&#8217;s almost word for word.</li>
<li><strong>Whitehouse copies Ordway on roughly a third of overlapping days.</strong> The two were enlisted men who served as scribes for the captains; their entries show stylistic and lexical drift toward each other through the journey, with later Whitehouse entries containing near-verbatim Ordway phrasing.</li>
<li><strong>Gass condenses on long days.</strong> When Lewis or Clark write 4,000+ characters, Gass typically writes under 200. His entry for that day will say &#8220;rained all day&#8221; while the captains describe nine pages of botanical observation and diplomacy. Read alone Gass looks terse; read against the others he&#8217;s filling in <em>only what the captains didn&#8217;t.</em></li>
</ul>
<p>This matters because the journals&#8217; authority has been treated as cumulative (&#8220;six perspectives confirm the same event&#8221;). The reality is closer to a single distributed text — written by six hands, but with editorial dependence baked in. The <a href="/narrator/joseph-whitehouse/">Whitehouse profile</a> and <a href="/narrator/john-ordway/">Ordway profile</a> in this database show their stylistic drift visible at the per-day level.</p>
</section>
<section class="findings-section">
<h2>2. Mention density tracks dependency, not respect</h2>
<p>The Corps encountered 54 Native nations. They named the <a href="/key-figure/mandan/">Mandan</a> 277 times in the journals. <a href="/key-figure/shoshone/">Shoshone</a> 188 times. <a href="/key-figure/nez-perce/">Nez Perce (Chopunnish)</a> 133. <a href="/key-figure/clatsop/">Clatsop</a> 142. <a href="/key-figure/teton-sioux-lakota/">Teton Sioux (Lakota)</a> 31. <a href="/key-figure/crow-apsaalooke/">Crow (Apsáalooke)</a> 57.</p>
<p>Read in isolation these mention counts suggest hierarchies of importance or relationship. Read against the geographic and chronological data, a different pattern appears:</p>
<ul>
<li>The four most-mentioned nations &mdash; Mandan, Shoshone, Nez Perce, Clatsop &mdash; correspond exactly to the four <strong>times the Corps was most dependent</strong> on Native assistance. Fort Mandan winter (five months under Mandan hospitality, learning the upper Missouri); the Lemhi Shoshone (horses to cross the Bitterroots); the Nez Perce (food when the Corps was starving in the Bitterroots, then guides for the return); the Clatsop (the winter trade that fed Fort Clatsop).</li>
<li>Nations the Corps encountered fleetingly but did not depend on &mdash; the <a href="/key-figure/cayuse/">Cayuse</a>, the <a href="/key-figure/yakama/">Yakama</a>, the <a href="/key-figure/walla-walla/">Walla Walla</a>, the <a href="/key-figure/palouse/">Palouse</a>, the <a href="/key-figure/makah-tribe/">Makah</a> &mdash; all have single-digit or low-double-digit mentions.</li>
<li>The <a href="/key-figure/teton-sioux-lakota/">Teton Sioux</a>, who confronted the Corps and demanded tribute at Bad River in September 1804, get fewer mentions (31) than the Shoshone (188) despite being the most diplomatically charged encounter of the outbound journey. Their absence from the documentary record isn&#8217;t accidental: the journals minimize what they had to back away from.</li>
</ul>
<p>Documentary density follows need, not significance. The &#8220;ethnography&#8221; the journals offer is an ethnography of dependence.</p>
</section>
<section class="findings-section">
<h2>3. Sacagawea is named 37 times across roughly 500 days of presence</h2>
<p>Sacagawea joined the Corps in November 1804 and remained with the permanent party through August 1806 &mdash; approximately 21 months, or 580 days. She is named (under all variant spellings: Sacagawea, Sah-cah-gah-we-a, Sah-cah-gar Wea, Sahcahgarweah, Sacajawea, &#8220;Squar (Sacagawea)&#8221;, &#8220;Charbonneau&#8217;s wife&#8221;, &#8220;frenchmans Squaw&#8221;, &#8220;Shabonos Squar&#8221;, and a few others) in <strong>37 distinct journal entries</strong>. <a href="/entity/people/Sacagawea/">View all 37 entries on a map.</a></p>
<p>That&#8217;s once every 16 days, on average. The mention rate is approximately five times lower than her husband Toussaint Charbonneau (106 entries) and forty times lower than her brother-in-arms George Drouillard (405 entries, all variant spellings of &#8220;Drewyer&#8221;).</p>
<p>The 37 entries that do name her cluster around specific events: her recognition of childhood landscapes near Three Forks; her reunion with her brother Cameahwait; her recovery of articles when a pirogue swamped; her insistence on seeing the whale; her vote on the Fort Clatsop location; her gift of weasel tails on Christmas Day 1805; her interpretation work in critical Nez Perce and Shoshone negotiations. The non-mention is itself the data: she was present every day but worth recording <em>only</em> on days when she acted as the captains&#8217; instrument.</p>
<p>The same pattern, in less politicized form, holds for <a href="/key-figure/seaman-dog/">Seaman, the Newfoundland dog</a>. He was with Lewis from August 1803 to at least 1806, every day. He&#8217;s named in 12 actual journal mentions across 36 entries that have been tagged with him. Approximately 1,100 days of presence; 12 days of being worth recording.</p>
</section>
<section class="findings-section">
<h2>4. Writing density is a stress signal</h2>
<p>Average entry length by narrator:</p>
<ul>
<li><a href="/narrator/meriwether-lewis/">Meriwether Lewis</a>: 3,564 chars/entry (the most expansive)</li>
<li><a href="/narrator/william-clark/">William Clark</a>: 2,407 chars/entry</li>
<li><a href="/narrator/john-ordway/">John Ordway</a>: 953 chars/entry</li>
<li><a href="/narrator/joseph-whitehouse/">Joseph Whitehouse</a>: 854 chars/entry</li>
<li><a href="/narrator/patrick-gass/">Patrick Gass</a>: 763 chars/entry</li>
<li><a href="/narrator/charles-floyd/">Charles Floyd</a>: 459 chars/entry (died Aug 1804, three months in)</li>
</ul>
<p>The Lewis-vs-Clark variance &mdash; popular history reads it as personality (Lewis the reflective philosopher, Clark the practical surveyor) &mdash; appears more parsimoniously explained as <strong>task division specified by Jefferson</strong>. Lewis was instructed to record ethnographic, botanical, zoological, and astronomical observations. Clark was instructed to survey the route, draw the maps, and record geographic features. Of course Lewis writes longer entries: he was assigned more to write about.</p>
<p>Within each narrator&#8217;s record, entry length collapses at predictable moments:</p>
<ul>
<li>Floyd&#8217;s entries between St. Charles (May 18, 1804: &#8220;we Lay at S&#8217; Charles&#8221;, 20 chars) and Independence Creek (July 4, 1804: 580 chars) trace the corps&#8217; shift from organized embarkation to active reconnaissance.</li>
<li>Gass&#8217;s entries from Hungry Creek (mid-September 1805, crossing the Bitterroots) drop to under 100 chars per day. The full corps was eating melted snow and horse meat. Writing fell off because survival took the daylight.</li>
<li>The longest entries from every narrator cluster at <a href="/expedition-phase/phase-3-winter-at-fort-clatsop-1805-1806/">Fort Clatsop</a> &mdash; five months of rain, stable shelter, and Lewis&#8217;s directive to compile the ethnographic and botanical write-up.</li>
</ul>
<p>The journals are not a chronicle of constant observation. They are a chronicle of when there was time and warmth to observe.</p>
</section>
<section class="findings-section">
<h2>5. The expedition wrote during half its waking hours</h2>
<p>Total characters across all six narrators&#8217; entries: approximately <strong>5.5 million</strong>. Average <strong>1,610 chars per entry × 3,415 entries</strong>. Across roughly 28 months of travel.</p>
<p>That works out to ~250 words per narrator per active day. For comparison: this is half a New York Times op-ed every day, per person, for 28 months &mdash; while walking, paddling, climbing mountains, often hungry, often wet, often sick. The journals are not an incidental record. The act of writing was a substantial daily labor, woven into the discipline of the expedition.</p>
<p>This matters for how we read individual entries: every word represents a choice about what to record and what to omit. The omissions are themselves a substantive record. Sacagawea&#8217;s relative silence in the journals is not a passive fact about the period&#8217;s documentary habits &mdash; it is a daily editorial choice by Lewis, Clark, and the four sergeants, repeated 580 times.</p>
</section>
<section class="findings-section">
<h2>6. The wildlife record is two records, layered</h2>
<p>The journals name <strong>297 species</strong>, most of which were already well known to the Native nations whose territories the Corps crossed. Some 178 were &#8220;new to science&#8221; in the sense that they had not been previously described in European botanical or zoological literature.</p>
<p>The naming pattern splits cleanly:</p>
<ul>
<li><strong>Game and trade animals</strong> &mdash; <a href="/entity/animals/deer/">deer</a> (1,052 mentions), <a href="/entity/animals/elk/">elk</a> (757), <a href="/entity/animals/buffalo/">buffalo</a> (~300), <a href="/entity/animals/beaver/">beaver</a> (369) &mdash; appear in entries every few days because the Corps depended on them for food, hide, and trade.</li>
<li><strong>Discovery species</strong> &mdash; the Mountain Beaver, the prairie dog, the grizzly, the bighorn sheep &mdash; are named at first encounter, then occasionally, then often not again. Their documentary footprint is much smaller than their scientific significance.</li>
</ul>
<p>The journals were not natural-history field guides. They were logistical journals in which natural history was secondary. The 178 discoveries we now celebrate are partly an accident of being the first U.S. citizens to write them down in English; many were already named in dozens of other languages.</p>
</section>
<section class="findings-section">
<h2>What this archive enables</h2>
<p>None of these observations require AI to make. What they require is the ability to ask, across all 3,415 entries simultaneously: <em>how often, when, and against what context?</em></p>
<p>This database makes those questions cheap to ask. Every entity has an aggregator page with a map and date slider. Every narrator has a per-day word count and mention pattern. Every phase has a tribal-encounter and wildlife inventory. Every date with multi-narrator coverage has a cross-narrator analysis showing who said what against whom.</p>
<p>If you&#8217;d like to test or refute any of the patterns above, the underlying data is browseable:</p>
<ul>
<li><a href="/entity/people/Sacagawea/">Sacagawea&#8217;s 37 named entries</a> &mdash; map them by date, see the editorial gaps</li>
<li><a href="/key-figure/mandan/">Mandan tribal profile</a> &mdash; 277 mentions, all entries indexed</li>
<li><a href="/narrator/john-ordway/">Ordway&#8217;s per-day record</a> &mdash; the cleanest of the sergeant journals</li>
<li><a href="/entity/animals/deer/">Deer mentions across the route</a> &mdash; the most-named species</li>
<li><a href="/entity/weather/rain/">Rain entries</a> &mdash; 417 days, plotted geographically</li>
<li><a href="/analyses/">Cross-narrator analyses</a> &mdash; 915 days where multiple journalists wrote</li>
</ul>
<p>The journals reward systematic reading. Two centuries of careful scholarship have made the texts available; this archive makes the patterns across them computable.</p>
<hr>
<p style="font-size:0.85rem;color:#777;"><em>Drafted May 12, 2026. All claims here are testable against the underlying data on this site. Counts current as of the database&#8217;s most recent update; see <a href="/whats-new/">/whats-new/</a> for change log. Each cross-narrator analysis on this site carries an editorial audit flag indicating whether quoted passages appear verbatim in cited sources; 6 are currently demoted to draft pending review. The author is an engineer rather than a historian; corrections from period scholars are welcome at ryan@terrain360.com.</em></p>
<p>The post <a href="https://lewisandclarkresearch.org/findings/what-the-journals-show-when-read-in-aggregate/">What the Journals Show When Read in Aggregate</a> appeared first on <a href="https://lewisandclarkresearch.org">Lewis &amp; Clark Research Database</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
