January 25, 2019
Six Rules for Enterprise Data Management
Share This Post
Most of the organizations are facing data management challenges. Multiple databases, with different content, duplicate content coming from various data sources, etc…
Organizations are struggling to derive value and analytics, or to properly search across multiple content, or to understand content dependency (i.e. if "X" event happened, then who and what is affected and what action should we take? Or if our organization needs to develop a new capability, how can we find what technologies are relevant and what other departments in my organization already developed, that we might leverage?
Data management depends on technology factors (database, storage, search engine, authentication, backup/redundancy, high availability, speed, etc etc..). This post is not addressing these tech factors, but focus on the higher level principle data strategy and data management. Here are six foundational aspects of any data management initiative:
1. Information Model
You need to model your data. Define the objects/entities you care about, the relationships between those entities and any associated attributes like topics/themes. For example, in suppliers data, a supplier/vendor would be a type of entity, with relationships to industries, or to products or to specific geographic areas of operation.
In the auto industry, supplier to Tesla (supplier relationship) would be Brembo. Brembo will have attributes (relationship) of geography (Italy), Industry (Automotive), Person (Chairman of the board Alberto Bombassei), etc.
Following a well-defined ontology will reduce data silos and ensure your data is more integrated/connected.
Make sure to define relationships that link objects across content sets (e.g. Person entity has an expertise-relationship to cardiac medicine). Relationships could be defined using scalable, dynamic schema-less approach (Linked Data http://linkeddata.org/) or using a more rigid schema definition.
2. Single source of truth
Similar or overlapping data can reside in different databases. Which data is correct? Which database/master to link/refer to?
If you have the same person name/entity appears in more than one source, and those sources/content-sets are not linked - then you have a problem.
A preferred way is to have one single people-master where all different sources who has some data on people, refer to the same person entity in the one single people-master (Examples of multiple data sources for a person: credit card data for a person, web-site visit stats of a person, mobile phone location data for a person, medical record data of person…. All those data-sets need to refer to the same person in the one single people master – the single source or the truth)
Different methods to collect, cross-reference (concord) data from various sources
3. Industry standards
When defining your information model, ontology, taxonomies, etc, it is recommended to use Industry Standards as your baseline. You can add on top of this your own unique specific relationship types and taxonomies, etc.
Examples of taxonomies: in the Biomedical industry taxonomy, in financial services: FIBO ontology, in the editorial journalism it is more of a taxonomy of News themes/topics, etc…
4. Machine-readable metadata
Every data has to have machine-readable metadata to make it findable, searchable and useful. There are many technologies and workflows to apply such metadata, either from fully manually, through semi-automated to fully automated. This is a practice by itself.
5. Content collection and automation
Collecting content, validating and curating and mastering the data (adding/deleting/updating the new data into the one single master) is all part of data management. For examples sourcing images data and properly applying tags and description to make the images searchable, or sourcing medical records from various physicians involves matching to the right patience identity, classifying the main symptoms, diagnosis and the treatments given
6. Organizational management and the human factor
Getting your organization buy-in is important for success. Moving toward proper Data Management culture requires participation of the entire organization and most important management backing. This would require hard decision on organization priority calls, and a long term vision.
There are many other aspects of Data Management
*Data Governance * central directory of data items * maintaining provenance and priority of data sources * entitlements * users quality feed back into masters * Data discovery workflows * intelligent search * subject-matter/content experts * ontologies * taxonomists and more.
Data Management has to start from the organization top management and be seen as long term investment, and it is something every organization will have to go through sooner or later to stay in business.
Share This Post