The king rules the kingdom. The queen does not rule the kingdom.

In the morning many people like to drink coffee or tea with breakfast. A warm cup of coffee on the table is a familiar part of the morning, and a warm cup of tea is just as common. Some people prefer strong coffee before class, while others prefer mild tea during breakfast. In a small cafe near the station, students often order coffee, tea, and other hot drinks. The cafe serves each drink in a cup, and the smell of coffee in the morning fills the room. Tea is also a popular morning drink, especially when breakfast is quiet and the weather is cold. Because coffee and tea appear in similar situations, they are often described with similar words such as cup, drink, breakfast, cafe, and morning.

A student may drink coffee before studying mathematics, and another student may drink tea before studying language. In the library, a book lies on the table while a student reads a story for class. The teacher explains the lesson, and the student studies in the library with a book nearby. Sometimes the teacher reads a book, and sometimes the student borrows a book from the library for study. A story in a book may be long or short, but the book, the student, the teacher, and the library often appear together in the same academic context. In this way, words such as student, teacher, study, book, and library become related through repeated use.

In another setting, the king lives in the palace and rules the kingdom. The queen also lives in the palace and governs the kingdom beside the king. A prince grows up in the royal family and may later become king, while a princess may later become queen. The palace, the kingdom, the prince, the princess, the king, and the queen all belong to the same royal context. A young prince may admire the king, and a young princess may admire the queen. When people describe royal life, they often speak about the palace, the royal family, and the kingdom together. Because of these repeated patterns, the words king and queen, as well as prince and princess, occur in strongly related contexts.

Animals form another group of related words. The dog is a loyal animal, and the cat is a small animal often kept as a pet. A dog may guard the house, chase a ball, or run through the garden. A cat may sleep on a chair, watch a bird, or hide under the table. The dog and the cat both live in the house and both belong to the family of common pets. Although a dog and a cat behave differently, the words dog, cat, pet, animal, family, and house often appear near one another. Because of that, dog and cat share part of the same contextual environment even when their actions are not identical.

Vehicles also appear in their own cluster of meaning. A car moves on the road, a bus moves on the road, and a train moves on the track. People drive a car to the city, ride a bus to the city, or take a train to the station. The fast car, the large bus, and the long train are different kinds of vehicles, but they often appear together in descriptions of travel and movement. A bus arrives at the station, a train arrives at the station, and a car stops near the station. The road, the city, the vehicle, the station, and the journey connect these words. When a corpus contains many sentences like these, the words car, bus, train, road, station, and city begin to form a coherent semantic group.

Some relationships are based not only on words that appear together, but also on words that can replace one another in similar positions. Good news and bad news are a simple example. Good weather improves the day, while bad weather ruins the day. A good result brings joy, and a bad result brings disappointment. A good meal tastes delicious, but a bad meal tastes awful. Good music improves the mood, while bad music ruins the mood. In these examples, good and bad often modify the same nouns such as news, weather, result, meal, and music. Because the two words appear in similar grammatical positions and share neighboring words, they show a paradigmatic relationship.

The same principle applies to coffee and tea. In the sentence “In the morning I drink coffee,” the word coffee can be replaced by tea without changing the overall structure of the sentence very much. Both words fit naturally with drink, morning, breakfast, cup, and cafe. In contrast, a word like book does not fit as naturally in that position. A book may lie on the table in the morning, but it is not usually something that people drink from a cup at breakfast. This difference in surrounding context is exactly what distributional models attempt to capture. Words that appear in similar contexts tend to develop similar vector representations, while unrelated words tend to be placed farther apart in the vector space.

A project about word embeddings often begins with a text corpus. The corpus contains many sentences, and each sentence contains words that appear near other words. From this corpus, a vocabulary can be built, and each word in the vocabulary can later be mapped to a vector. A model then learns from neighboring words inside a context window. The training process adjusts word vectors so that words with similar contexts receive similar representations. The model does not learn meaning from a dictionary definition. Instead, the model learns from patterns in text. Because of that, the corpus plays a central role in the quality of the resulting embeddings.

A simple vector space model may begin with counts. If the word coffee appears often near cup, morning, and drink, then the vector for coffee will reflect those context features. If tea also appears near cup, morning, and drink, then the vector for tea may point in a similar direction. A book, however, may appear more often near library, story, teacher, and student. In that case, the vector for book will differ from the vector for coffee or tea. Even with simple counts, a computer can begin to represent semantic relationships through geometry. Words become points in a vector space, and similar words appear closer to one another than unrelated words.

Cosine similarity is often used to measure this closeness. If two vectors point in a similar direction, the cosine similarity between them is high. If they point in very different directions, the cosine similarity is lower. This means that the vectors for coffee and tea may have a high similarity because both words occur with cup, morning, breakfast, and drink. The vectors for king and queen may also have a high similarity because both occur with palace, kingdom, and royal family. In contrast, the vector for dog may be less similar to the vector for train because dog belongs to the context of pet, animal, and house, while train belongs to the context of vehicle, station, and city.

The distributional hypothesis states that words with similar meanings tend to occur in similar contexts. This idea is often summarized by the statement that a word is known by the company it keeps. A distributional model therefore does not capture meaning in a human or philosophical sense. Instead, it captures the statistical structure of usage in text. If a corpus repeatedly places coffee and tea in similar local environments, then the model can infer that they are related. If the corpus repeatedly places king and queen in similar royal contexts, then the model can also infer a relationship there. Meaning, in this framework, emerges from context.

During training, the dimensionality of the embedding space must be chosen as a hyperparameter. A small number of dimensions may be enough for a simple corpus, while a larger number of dimensions may capture more subtle relationships in a larger dataset. However, more dimensions also require more computation and more data. A larger corpus usually improves the quality of embeddings because it provides more examples of how words are used. If the corpus is too small, the learned vectors may be noisy or unstable. If the corpus is well designed, even a small model can still capture useful semantic patterns.

When students implement such a model from scratch, they often gain a better understanding of the underlying algorithm. Instead of relying on external machine learning libraries, they work directly with the mathematical operations used during training. The project then becomes not only an exercise in programming, but also an exercise in understanding how a model learns from data. By reading the corpus, constructing the vocabulary, defining the context window, updating the vectors, and measuring cosine similarity, the student sees how semantic structure can emerge from repeated word usage.

The most surprising part of this process is that meaningful structure can arise from such simple observations. A model sees only words and contexts, yet it can still discover that coffee and tea are related drinks, that king and queen are related royal words, that dog and cat are related animals, and that car, bus, and train are related vehicles. It can also observe that good and bad behave similarly in phrases such as good news and bad news. These patterns do not come from hand written semantic rules. They come from repeated structure inside the corpus itself.

For this reason, the design of the corpus matters greatly in a small project. If the text repeatedly places related words in overlapping contexts, the model has a chance to learn meaningful embeddings. If the text is too small or too inconsistent, the vectors may fail to reflect the intended relationships. A carefully written paragraph style corpus therefore offers a good compromise. It looks like natural text, but it still contains enough repeated structure to demonstrate how word embeddings work in practice.

In the morning the student drinks coffee before class and later reads a book in the library. At breakfast another student drinks tea and studies language before meeting the teacher. In the city a bus stops at the station while a train arrives on the track and a car waits near the road. In the palace the king speaks with the queen while the prince and the princess walk through the royal hall. At home the dog rests by the family while the cat sleeps under the table. Good news improves the mood, but bad news can change the whole day. Across all of these paragraphs, words gain meaning from their neighbors, and the corpus becomes a small demonstration of distributional semantics in action.