Data Mapping and Strong Data Integrity
I’m sure you’ve heard the saying “garbage in, garbage out.” There is no more applicable use of this quote than when discussing the data needs of a project. Getting clean, concise data in and out of a system is one of the most important tasks, and if not approached thoughtfully from the start, can lead to disastrous results later on. Chasing data bugs can be a time-consuming task that can be frustrating and can cost you in time and budget. Ensuring the data integrity of a system requires proper planning and execution and one of the best tools an architect has for that is the data mapping process.
Stated simply, data mapping is the process of creating data element mappings among two distinct data models, a source data model and a destination data model.
With any data mapping exercise, you want to understand each system involved. You’ll need to identify the data sources and destinations, as well as any transformations that need to happen to get that data into the correct format and location.
The ultimate goal is to prepare a standardized data-mapping template for the project to use across all data integrations and interfaces. This template helps facilitate the discovery process and keeps documentation consistent.
The best approach I have found is to use an Excel document and have all the fields identified and defined for that particular integration. Depending on the number of systems involved, these data mapping documents can get complex, so I like to try and keep it simple and concise by starting with these basic elements and adding in there where necessary for clarity.
Destination: This is the “object” receiving the data. It may be a data table, data object, endpoint parameters of a web service, or a field in an Excel file.
Destination Field ID: This is the field receiving the data. You want to identify the ID of that table column or entity property.
Destination Date Type: This is the data type of the destination field (i.e. String, Boolean, Integer etc.). It’s important to know the data type to avoid discrepancy between the sending and receiving systems. You’ll need to account for data type mismatch and handle it in the transformation process to get it into the correct format. Also, you’ll need to define the size of the destination data type to account for overflow issues with numbers or truncation situations with string and large text values.
Required: Is this field required in the destination table?
Sample Data: I like to see a sample of how the data should appear. This is especially important with data types like real numbers and date/time fields where the formatting can be quite different between systems.
Origin: This is the origin of the data. This could be a single element like a database table or a combination of elements like joining tables, chaining functions or catenation of strings.
Origin Field ID: This is the origination data field. There maybe a 1-to-1 data correlation between your origination field and your destination field or you may need to combine fields from your origination data sources or enrich fields to get your destination data value.
Origin Data Type: This refers to how the data is stored in the original system (i.e. is it an integer, string or real number). If the origination field and the destination field don’t have matching data types, you will need to account for this data type conversion in your transformation process by casting the value or formatting it.
Data Transformation: This is a brief description of what kind of data transformation needs to happen to get the origination data into the destination data correctly. This may require combining data sources, concatenation of data elements, casting of data fields, system functions or even hard coding values if need be.
Data mapping documents are a crucial part of any project dealing with multiple systems. Their design and implementation don’t have to be a laborious task involving complex and unwieldy documentation. I outlined the most important elements that will get you most of the way there in understanding the data in each of your systems and how they will integrate.