Data Integration is a data preprocessing technique that combines data from multiple sources such as databases (relational and non-relational), data cubes, files, etc., and provides users a unified view of these data. It gives a complete picture of key performance indicators (KPIs), customer journeys, market opportunities, etc.
The data sources can be homogeneous or heterogeneous. The data obtained from the sources can be structured, unstructured, or semi-structured in format.
In data integration, we talk about combining data from multiple sources, and you might be wondering what data we mean. Well, modern companies – even those that are smaller in size – have adopted numerous digital tools to assist them in their day-to-day operations. These can range from marketing and sales tools to logistics and transactional processing tools, even a small team with basic operational needs uses multiple tools; all of these tools create data that without integration processes will result in detrimental data silos.
There are mainly 2 major approaches for data integration :
1. Tight Coupling:
- Here, a data warehouse is treated as an information retrieval component.
- In this coupling, data is combined from different sources into a single physical location through the process of ETL – Extraction, Transformation, and Loading.
2. Loose Coupling:
- Here, an interface is provided that takes the query from the user, transforms it in a way the source database can understand, and then sends the query directly to the source database to obtain the result.
- And the data only remains in the actual source databases.