Database Design: The Case for Co-evolution of Applications and Data

The traditional wisdom for performing logical database design can be found in any DBMS textbook.

Basically, the process is:

Form an entity-relationship (E-R) model of your data

When you are satisfied with your E-R model, push a button

Which executes an E-R to third normal form (3NF) translation algorithm

Create the 3NF schema and code the application logic for this schema

 When business conditions change (and they do at least once a quarter), then

Update the E-R model

Update the schema

Evolve the data to the new schema

Do application maintenance

Applying these principles will guarantee the schema is always in 3NF and is thereby a “good” schema.  In contrast, extensive application maintenance may be required. Also, repeated patching of application code may cause the application to “decay,”  i.e., become more convoluted and harder to maintain. In other words, the traditional wisdom will ensure no database decay but perhaps large application decay.

In the real world, NO SERIOUS DEVELOPERS use the traditional wisdom.  Some may use it for initial development, but none use it for evolution.  Specifically, developers are almost always interested in minimizing application maintenance, and will endure large amounts of data decay to achieve this goal.  To minimize application maintenance, the goal is to change the schema as little as possible (preferably not at all) by introducing data redundancy.  Hence, data is allowed to decay to minimize application decay.  Sooner or later applications (or the database) become so degraded that a complete project re-implementation is required.

The thesis of this project is that one should perform co-evolution of code and data.  Specifically, one should have a holistic metric that minimizes the composition of data and code decay. Sometimes one should focus on application decay and sometimes on data decay.  We have obtained six years of application and data evolution from B2W, a very large Brazilian e-tailer.  Our data encompasses about 70 iterations of their checkout software.  Our initial analysis justifies our thesis, because B2W appears to sometimes evolve application code, sometimes the DBMS schema and sometimes both.

We have mostly built an evolution tool that can quantify the decay in either dimension that any modification will entail.  On top of this tool, we hope to build a machine-learning-based recommendation engine that will suggest which evolution tactic to use.

Citations

Michael L. Brodie, Michael Stonebraker, Ricardo Mayerhofer and Jialing Pei. 2018. The Case for the Co-evolution of Applications and Data, North East Database Day 2018 (NEDS 2018), January 19, 2018.

Michael Stonebraker, Dong Deng and Michael L. Brodie. 2017. Application-Database Co-Evolution: A New Design and Development Paradigm. New England Database Day, (pp. 1–3), January 2017.

Michael Stonebraker, Dong Deng and Michael L. Brodie. 2016. Database Decay and How to Avoid It (pp. 1–10). Proceedings of the IEEE International Conference on Big Data, Washington, DC. December 2016.