How to Determine the Commonality of English Words: Tools and Techniques

How to Determine the Commonality of English Words: Tools and Techniques

Introduction

For English learners and educators, understanding the commonality and frequency of words is crucial. There are several tools and methods that can help you determine how common a particular word is in English. In this article, we will explore these tools and techniques, focusing on the Google Ngram Viewer and the Longman Dictionary of Contemporary English. Additionally, we will discuss a more manual approach using a text file and scripting.

The Longman Dictionary of Contemporary English (LDOCE)

The Longman Dictionary of Contemporary English (LDOCE) is a highly regarded resource for English learners and enthusiasts. It provides valuable information on the usage and frequency of words in both spoken and written English. For instance, if you look up the word common, it is marked as S1 and W1, indicating its high frequency in both spoken and written English. On the other hand, the word frequent is marked as W3, suggesting it is predominantly used in written English and less common in spoken language.

The Google Ngram Viewer

The Google Ngram Viewer is a powerful tool that provides a visual representation of word and phrase frequency in books over time. To demonstrate, let's look at the usage of computer vs. phone vs. television. You can input these terms into the Ngram Viewer to see how their popularity has changed over the years. This tool is particularly useful for researchers and linguists who want to track the evolution of language and specific terminology.

A Manual Approach Using Text Files and Scripting

While the Google Ngram Viewer is highly efficient, there is a more manual way to determine word frequency in a specific text file. Here is a simplified, yet somewhat tedious, process:

Step 1: Save the book you are interested in as a text file (e.g., book.txt) on your machine (running either Linux or a Windows machine with Cygwin). Step 2: Use the vi editor to convert every space to a carriage return. The command is: :1s/ /CTLJ/g, where CTLJ means holding down the Ctrl key and hitting the J key. Step 3: Save the file as words.txt, sort it, and save the sorted file as sorted.txt. Step 4: Use an awk script (not provided in this example) to convert each sequence of identical words into a single line that contains the count of occurrences followed by the word itself, saving the output to awk_output.txt. Step 5: Sort awk_output.txt numerically in reverse order to get result.txt. The first line in this file will be the most common word, followed by the second most common, and so on.

Note: Running this test may take several minutes depending on the size of your initial text file. For a more efficient approach, consider using the Google Ngram Viewer.

Conclusion

The Google Ngram Viewer is the most straightforward and efficient tool for determining the commonality of English words. It provides historical data and a visual representation that is easy to interpret. For a more in-depth analysis or for cases where online access is limited, the manual method using text files and scripting can be employed. Whether you are an English learner, a researcher, or a linguist, these tools and techniques will help you understand the usage and frequency of English words.