THIS IS A DRAFT. The LIBRARIAN EXPLAINER Series Part 2
In the previous section, we learned that data resides in a central repository—the “library.” The idea is simple: everyone knows where the data is, how to access it, and hopefully understands its purpose. But what happens when you take a book out of the library, and why do we do it in the first place?
When we borrow a book from the library, it’s not to prevent others from borrowing it. This library operates differently—everyone can access the same book simultaneously without limiting anyone else’s ability to do the same. We borrow books because we need to work with the information in them: to read, analyze, and compare their contents with other books.
Now, imagine a library where, after borrowing a book, you’re allowed to amend its contents and inform the library of the change. This updated book is now permanently altered, so everyone who borrows it in the future benefits from the latest version. This isn’t a typical library, but it illustrates a core principle in data management: when we interact with data, it needs to be updated at the source for everyone else to access the most current information.
Moreover, just like in this hypothetical library, users can donate new books (data) or remove outdated ones, ensuring that the library remains relevant and accurate. But, access to these capabilities depends on your permission level—just anyone can’t delete or update everything in the library. Permissions maintain security and structure within the system.
Why We Borrow Books
We borrow books (extract data) because we need to analyze the content, compare it with other sources, and sometimes trigger further actions. For example, comparing two books might lead to borrowing a third book or updating an existing one. In the context of data, these actions are crucial for making informed decisions.
The Problem: Multiple Versions of the Truth
However, a major issue arises when we fail to update the library (the central repository). Instead of returning the updated book (data) to the library, we often keep changes in our local copy. This means that when others borrow the same book, they get outdated information. Over time, this creates multiple versions of the truth, which can be disastrous in a business environment.
For example, consider customer orders or financial records. There should only be one authoritative version of these records. If people start working on local copies without updating the central data, discrepancies emerge, leading to confusion and errors.
The Alternative Ecosystem
Worse still, a new ecosystem forms around these local copies. People share these personal versions of books (data) with others, bypassing the library altogether. As a result, the original system, designed to be the authoritative source, becomes overshadowed by a chaotic network of unofficial, outdated, or incorrect versions.
This practice is prevalent in many organizations, where spreadsheets are often exported and shared without updating the central database. This leads to siloed processes, fragmented data, and inefficiency. It’s not how data management is supposed to work.
How It Should Work
When we extract data to work on it, the result of our analysis should feed back into the central system. This ensures that the next person or process that needs the data gets the most accurate and up-to-date information. Whether it’s about customer orders, supply chain management, or financial records, the state of play must always be reflected in the main library.
Satellites—temporary spreadsheets or applications used for analysis—are necessary, but they should always be connected back to the central repository. They should never function as independent data sources. The key is ensuring that after working with the data, the results update the official system, eliminating the risk of multiple versions of the truth.
The Real-Life Example
A case study from 2022 demonstrates this principle. A company called Casters, working as surveyors for hospital roofs, was facing a mess of misfiled, mislabeled spreadsheets. Each surveyor used a template to record data, but the files were often saved in the wrong folders, with incorrect labels, creating chaos. As they took on more clients, the system became unmanageable.
The solution was simple: instead of relying on physical spreadsheets, a central cloud-based data warehouse was created. Surveyors uploaded their data directly to the cloud, where it was properly labeled and immediately available for reporting. This streamlined the entire process, allowing administrators to access real-time, accurate data through Power BI.
Conclusion
The analogy of borrowing books from a library highlights the importance of managing data in a central, reliable repository. When organizations allow data to be worked on locally without updating the central system, they create inefficiency, errors, and confusion. By ensuring that all updates are reflected in the main repository, businesses can maintain a single source of truth, improving decision-making and operational efficiency.
This is the fundamental role of data in business processes: to act as a reliable, up-to-date source that everyone can access. Failing to respect this principle leads to chaos, fragmentation, and a loss of control over the business’s most valuable asset—its information.
Add comment