Text Analysis of The Journals of Lewis And Clark

Sources

The Project Gutenberg eBook of The Journals of Lewis and Clark, 1804-1806. It may be copied or shared under the terms of the Project Gutenberg License included with this ebook or online here. Downloadable .html and .txt files are available at the bottom of this page.

Released on July 1, 2005 [eBook #8419]; most recently updated on January 31, 2013.


Introduction

William Clark authored 1,177 entries, while Meriwether Lewis authored just 427 entries. They collaborated on 4 entries, and at one point, some ‘Whitehouse’ co-authored 1 entry with Clark (September 9, 1804).

Within these journals are countless stories (well, textual analysis tells us there are precisely 1,609 entries between the two men) which occurred from 1804 to 1806. 


Meriweather Lewis and William Clark recorded their expedition between 1804 and 1806 meticulously in a series of journal entries. Here is what a textual analysis of their works have to offer:

There was slightly more killing (1433x) than there was partying (x1170), but still much more partying than I was really expecting.

Some spelling differences made this quite difficult to complete texual analysis on. These are just some of the most frequently misspelled words. 

The misspellings, although confusing at times, are an authentic part of these works, and some may reflect convention at the time while others reflect personality and authenticity. The transcriber and editor of this Project Gutenberg Ebook also remarks that, “most of the misspellings are almost 200 years old,” and that “misspellings, inventive punctuation and lack of punctuation along with variable capitalization, and not entirely clear abbreviations have been left as is,” in an effort to preserve the true nature of the documents.


I searched for ‘negative’ words to see why different feelings occurred throughout their journeys. It seems that the words most heavily associated with being afraid had to do with things they didn’t know about, like indigenous people and the ‘wild’ whereas things they found disagreeable were things they did know about, notably the weather.


In this cirrus of the most frequently used words, we can see that there is a difference between the words they used then and the words that would be considered appropriate now, including derogatory terms for indigenous people.

We can also extract that the majority of their journals described the land, weather, and features around them, using words like river, place, creek, and wind, and they discussed the people that they met along the way, using words like village and party.


Process

I decided to do textual analysis for this data set because I was curious about Lewis and Clark’s language choices and whether they may reflect the time in history, some biases, or the purpose of the expedition. I certainly think that there seems to be some bias against indigenous people in the text based on the words that were associated with those terms.

I also think the text reveals that their mission was to describe the land and what was going on around them. The government did not know much about the undeveloped west at this point in time, and the journal reflects that these were the most important details.

Presentation

I chose to embed many of the Voyant tools I used to draw my conclusions because one of the best ways to communicate, as we have discussed in class, is by seeing. I included captions and discussions below my embedded visuals for clarity in my conclusions, but they work together, not alone.

Significance

I think that a sexual analysis inherently reveals patterns and insights about a given document or data set. Although I didn’t read the entire journal, I was able to reveal that there were patterns in how certain words were used and how the authors thought about weather or people they encountered along the way.

This pertains to Digital Arts and Humanities opposed to data science in particular because the nature of the document isn’t a quantitative data set, it is a compiled set of journal entries that can be analyzed together to extract patterns and noticings. This project is subjective in that the conclusions I draw are a result the techniques and tools I used rather than finding some clear cut conclusion within a data set.


Downloadable Sources