Best practices
Your data model depends on your particular application's data access patterns. Therefore, you need to structure your data to match the ways your application queries and updates it. For this reason, a good data model is critical for improving the performance and scalability of your application. Below are some best practices that you can utilize while designing your data model.
- Data that is accessed together should be stored together. Note that this does not say data that is related to each other should be stored together. If you are retrieving or updating data together frequently, you should probably store it together. Data is commonly stored together by embedding related information in sub-model objects of object lists.
- Avoid growing the size of model objects. If your model allows creating objects that grow in size continuously then you should take steps to avoid this because it can lead to degradation of database and disk I/O performance. Instead, aim for data model design where a single model object can max a few megabytes. Ideally, for faster query performance, it is best to keep model object size max few kilobytes.
- Minimize object references as much as possible, instead denormalize (duplicate) data in required models. For example, you might have two models, one for books and the other for authors. If you would like to have access to the author name in your book model, you have two options: either create an object reference to the author in the book model or duplicate the author name in the book model. If you have the book's author name in the book model, you can access the required data much faster. Otherwise, you need to make joins (lookup) to the author model to fetch author name information slowing down query performance. If you are retrieving data from multiple referenced models and joining a large amount of data, you have to call the database several times to get all the necessary data. As a solution for this case, if your application heavily relies on object references, then denormalizing these references makes more sense. You can use sub-model objects or object lists to get all the required data in a single query call.
- Data model design actually comes down to two choices for every piece of data. You can either embed that data directly in a model or reference it. Favor embedding data unless there is a compelling reason not to. It is good practice to follow the below rules to design model relations and decide on embedding vs. referencing.
- One-to-One - Prefer key-value pairs within the document
- One-to-Few (less than a few hundred) - Prefer embedding
- One-to-Many - Prefer referencing
- Many-to-Many - Prefer Referencing