In this lesson we will be getting an overview of database design. Designing databases, or data modeling is a highly sought out profession and many organizations that value data have one or many resident data modelers. Through this lesson you’ll learn why!
There is a great amount of effort and planning for databases relative to other areas of development. One of the main reasons for this is that compared to other areas of software development, databases require more effort to modify. There of course are newer databases tools and technologies that makes databases more flexible than before, and many teams still implement agile methodologies, but just keep in mind that in many cases, database development tends to lean towards a slightly more step-by-step process compared to fully agile development projects. Below are some common stages for a basic database design.
Let’s walk through these steps and get a feeling of how each step will look like.
Requirements Collection
Requirements collection is the initial step of database design where the data modelers will get together with the stake holders and discuss the requirements. In many cases, we will also involve the developers of the technologies or platforms that will be accessing the database. The requirements that need to be fleshed out and documented will be more evident as we go further on this journey together.
The requirements gathering could start from high level questions such as:
You could notice that some of these questions will lead towards insights on what kind of technology will be the best fit, and some questions could lead us down a rabbit hole to unearth further details.
From high level questions as above, the requirements gathering will then go into very detailed questions such as:
As you could assume, a great amount of details from these discussions should be carefully documented. We will try to get enough of the details aligned to get started on the next steps, but we will still need continuous efforts to clarify further requirements as the database design moves forward.
Conceptual Design
Through the requirement gathering phase, we now have a decent understanding of the database we need to create. Through consistent discussion and alignment, we will work with the stakeholders to create a diagram that at very high level shows “in concept” the data that we wish to capture and utilize in the database.
At this point, we don’t need to focus too much on the symbols or notations that we use. We just want something high-level to align with the stakeholders that gives a good enough model to work further into the details. We need to keep in mind that this and other models and diagrams are all tools for communication. Whether it’s a stakeholder that have no technical background, or developers creating a webpage that will use this database, all models and diagrams that we create will be used to communicate.
Logical Design
In this stage, we will create an diagram that uses specific notations to detail out necessary properties of the database. An expert that does this work is a Data Modeler. When the project team is small and the task isn’t too complex, developers, database administrators, or sometimes business analysts could play the role of the data modeler. For larger projects and organizations, there could be a separate data molder role. This role consists of creating conceptual or logical designs so that anyone, regardless of expertise and role, could understand what they need in relation to data that necessary for their role.
It sounds like a daunting job, but that’s why there are specific diagram constructs and tools that will help the data modeler digest the requirements, create a data model that could be used to confirm and discuss the data requirements with the stakeholders. The data model will also become a blueprint and continuous reference for all developers. We will go into the diagrams and symbols used by data modelers in the next section.
Physical Design
This is the stage where a database developer will look at the artifacts created so far and start to code (create!) the database implementation. The database developer will most likely look at the conceptual design to get the lay of the land, then start carefully interpreting the logical design and writing code to implement it. The developer will give attention to all the details and properties given in the logical design, but sometime have to discuss further with the data modeler, or refer to the requirements document if needed.
Now that we’ve walked through the stages of database design, let’s take a deeper look into the symbols, terms and notations that are used in data models. As you might have noticed in the database design stages, as developers, we will be working a lot in reading data models and coding the implementation.
Logical models in most cases are Entity Relationship Diagrams (“ER Diagrams”). They are tools to communicate the relationship between entities. Then what are entities? Entities, as the word entails, are “things.” To a stakeholder, this would be a thing that has information that needs to be stored. For a developer these “things” could manifest as a table in SQL. Physical entities could be things such as teachers, cars, phones, etc. These entities, according to the database, will embody different types of information to meet it’s purpose and they will have different relationships with each other. Some logical entities could be “things” that is not a physical object, but still embody information that we hope to store and/or utilize. Some examples of logical entities could be account, order or course.
Entities are in a lot of cases the star of ER Diagrams and they are commonly depicted as rectangles. Sometimes they could be as simple as just the entity name with a list of fields associated with it, or it could who a bit more pertinent information such as what the keys are, and what types of data it will hold. Keys are what will be used to identify that entity within the database. Usually it’s an identifier of some sort that could be assigned and doesn’t always have to be human readable.
Let’s say that we are creating a data model for a car dealership. To begin with, some obvious entities that we will need is a Dealer entity and a Car entity. And let’s say that different dealers are assigned cars to sell. For the car entity, we could include the necessary fields, and according to the purpose of the model (especially if it’s for developers) it could also include the type and key information. PK stands for primary key. This is the key that will used to identify this entity. FK stands for foreign key. This is a key that is not from this entity, but from a different entity. In this case, it is the ID for the dealer.
So now we have a dealer entity and a car entity. Another important portion of the ER Diagram is the relationships between the entities (hence the “R” in “ER”).
You might see some variances, but above is an example of commonly used symbols to describe the relationship between entities. You could notice that a perpendicular line means that it’s mandatory, an empty circle means that it’s optional, and the crow’s feet denotes that there could be multiples. The relationship of how many could exist on each side (one-to-one, one-to-many, many-many) is referred to as cardinality.
Let’s take a look at the example above and try interpreting the relationship between the dealer entity and the car entity. For car, dealer is mandatory and it should be connected to one dealer. For the dealer there could be multiple cars, but having a car connected is not mandatory. This means that a dealer could be assigned to multiple cars that the dealer is selling, but there could be dealers that don’t have any cars that they are currently selling. The ER Diagram is also showing that one car can only be assigned to one dealer, meaning several dealers could not be working on selling the same car. Also every car needs a dealer. They can’t be just cars lying around that doesn’t have a dealer assigned to it.
As you could notice, there are lot of details in this diagram that will have a significant impact on how the database is built. Additionally, these are critical requirements to be aligned with the stakeholders.
Now that we’ve went through an example of how databases are designed, we will move forward in working on implementing the databases that were designed through code.
You can watch a video of a brief walk through of this topic here.