Nearly any data model of non-negligible complexity will include at least one (if not several) many-to-many relationships. This relationship emerges when, given two tables A and B, instances of A could have many instances of B, and instances of B could have many instances of A. Successfully implementing this relationship is a bit of stumbling block for people new to working with or building databases. However, the strategy for implementing this relationship in relational databases is important to understand. Believe it or not, it’s also not all that complicated.
Once you’ve implemented this relationship a few times, the strategy becomes rather intuitive. However, I remember this concept being one of the hardest ones for me and my classmates to finally understand when it was introduced in second year of undergrad. Hopefully, this article will help you internalize the many-to-many relationship like I eventually did.
What is the Many-to-Many Relationship?
The many-to-many (m2m) relationship is a relationship that occurs in data modeling when two entities are related but neither belong to the other and both can have many instances of each other. Here are some examples of many-to-many relationships:
- a pet can have many owners, and owners can have many pets
- a language can be spoken by many people, and a person can speak many languages
- a class can have many students, and a student can be in many classes
- a venue can have many visitors, and a visitor can visit many venues
To better understand the many-to-many relationship, let’s briefly talk about the one-to-many relationship to see how the m2m differs:
- a country can have many cities, but a city can only be in one country
- a car company can make many car models, but each car model is made by only one company
In each of these examples, note that there is a sense of one entity belonging to another.
Modeling One-to-Many in the Database
One-to-many relationships are easy to model in the database. For example, consider the country-city example. The country India has many cities such as Mumbai, Hyderabad, and Kolkata, but Kolkata only belongs to the country of India.
So, in the database, we probably have a Country
table that looks like this:
And the City
table plus relationship to the Country
table would look like this:
This is simple enough, and we can query for all cities in a country by simply writing something like SELECT City.Name FROM City JOIN Country ON Country.Id = City.CountryId WHERE Country.Name = 'India'
and we’d get a result set including Mumbai, Hyderabad, Kolkata, and more.
But this kind of modeling is easy and we’ve only used it as an example to better understand our next section: modeling the many-to-many relationship.
Modeling Many-to-Many in the Database
Now let’s consider the more complex many-to-many relationship. Specifically, let’s consider the instance of many people speaking many languages. For example, we might have a Person table and Language table like so:
Now, suppose we have two people: Erik and Lina. Erik speaks English and Spanish, Lina speaks English and Arabic. This is what we mean by many-to-many relationships; Lina speaks many languages–English and Arabic–and English is spoken by many people–Erik and Lina. So how do we relate the Person
and Language
tables to list out all of the languages a person speaks, or all of the people that speak a language?
The answer is to create a new table, the associative table. The associative table sits between the two entities in a many-to-many relationship and actually has a one-to-many relationship with each. That is, given tables A and B and an associative table C, A has many Bs through C and B has many As through C; put another way, A has many Cs, B has many Cs, but C has one A and one B.
Note: The associative table are also sometimes called association, bridge, intermediary, join, junction, crosswalk, or linking table (and there’s probably many more names for it). I learned the term “associative” table in undergrad, and I think that’s its official name in relational theory, so that’s what we’ll stick with in this article.
Let’s make this more concrete by creating an associative table for the Person
and Language
relationship. It’s common practice to name association tables as an amalgamation of the tables they join, so we’ll call our associative table PersonLanguage
. I’d create such a table with the following SQL:
CREATE TABLE PersonLanguage(
Id INT IDENTITY(1,1) PRIMARY KEY,
PersonId INT FOREIGN KEY REFERENCES Person(Id),
LanguageId INT FOREIGN KEY REFERENCES Language(Id)
)
Notice that we specified two foreign keys on this table, PersonId
and LanguageId
, pointing to the Person
and Language
tables respectively. The table structure now looks like this:
Now, there will be a record in the PersonLanguage
table for every language an individual Person
speaks. As a result, there will also be a record in the PersonLanguage
table for every person that speaks a Language
. If we want to see the list of languages Lina speaks, we could run the following query:
SELECT l.name
FROM PersonLanguage pl
JOIN Person p ON p.id = pl.PersonId
JOIN Language l ON l.Id = pl.LanguageId
WHERE p.name = 'Lina'
and we’d get results like:
Likewise, if we want a list of all the people that speak English, we could run a query like this:
SELECT p.name
FROM PersonLanguage pl
JOIN Person p ON p.id = pl.PersonId
JOIN Language l ON l.Id = pl.LanguageId
WHERE l.name = 'English'
This would give us the following results:
Extending the Associative Table
The above examples show the most basic implementation of building associative tables to break up many-to-many relationships. The example PersonLanguage
associative table above only has two columns (besides its own ID column) for pointing to Person
and Language
records.
We can actually add columns to this table as well and capture interesting information. For example, what if we wanted to record which language was a person’s mother tongue? In other words, if Erik grew up speaking English and learned Spanish later, but Lina grew up speaking Arabic and learned English later, how would we capture that?
Thanks to the associative table, the answer is pretty easy. We just have to add a column to PersonLanguage
that indicates if the Language
record captured is the Person
record’s primary language. In T-SQL, I’d run the following ALTER
statement:
ALTER TABLE PersonLanguage ADD PrimaryLanguage BIT
and set the PrimaryLanguage
value to 1
for languages the person learned from birth. By capturing this data on the associative table, we can now write a query to generate a report of every person and their mother tongue like so:
SELECT p.name AS 'Person', l.Name AS 'Primary Language'
FROM PersonLanguage pl
JOIN Person p ON p.id = pl.PersonId
JOIN Language l ON l.Id = pl.LanguageId
WHERE pl.PrimaryLanguage = 1
and we’d get results like:
That is just one example of how to extend the functionality of our database with associative tables.
Conclusion
In this brief article, we learned how to break up many-to-many relationships in relational databases. The general approach is to place an associative table between the two tables and have a one-to-many relationship with them. Then we saw how we can use associative tables to capture nuanced data and create interesting reports. This is an important concept to understand in any kind of software or data engineering work that uses relational databases. As always, I hope this was helpful to you and don’t be afraid to contact me with any questions you might have!