On Demand Data Integration Solutions for Remote Data Sources
2011
Jānis Kampars

Defending
14.12.2011. 14:30, DITF, Meža iela 1/3-202

Supervisor
Jānis Grabis

Reviewers
Leonīds Novickis, Peteris Rivža, Enn Õunapuu

Decision-making problems often require large amount of initial data acquired from different data sources. Data warehousing is a traditional data integration solution, which is used to import and to unify the necessary data. If the data sources are located outside of the organization, a data warehouse is used to store a full copy of the external data. Local storage of data requires complex infrastructure and regular data updates. An alternative toa local storage of the external data is retrieval of the necessary data from data services or web services when it is required to solve a specific decision-making problem. In this case critical factors are effectiveness of the data integration process, web service heterogeneity, data transformation and integration of various data sets. The objective of the thesis is to develop on demand data integration solutions for data retrieval from remote, heterogeneous data sources and transformation of data retrieved in the necessary form. In this work the model of remote source on demand data integration architecture is developed, requirements for remote source on demand data integration architecture are defined, based on which the technical architecture and methods for optimizing data integration process aredeveloped. Practical evaluation of the solutions included in the architecture is performed. The main benefits are increased speed of data integration process and a data retrieval solution that is easier to maintain and modify. Data retrieving from heterogeneous web services (i.e., services that are developed using different standards, different communication protocols and data models), substitution of web services in case of changes in their interfaces or errors, load balancing between functionally equivalent web services, consideration of complex data integration task interdependencies and data retrieval operation parallelization is provided. The main scientific contributions of this study are: 1. Design of a new remote data integration method, which is based on web service method level abstraction and separation of the data integration process and data source access logic. 2. Development of an adaptive web service selection and load balancing algorithm, which is based on functional and nonfunctional requirements. 3. Development of an algorithm providing correct and timely execution of individual data integration tasks. 4. Effectiveness evaluation of solutions developed. The developed system is used to gather data for facility location and passenger transportation planning decision-making problems. The doctoral thesis consists of introduction, 5 chapters, conclusion, 5 appendixes, bibliography (174 titles), 75 pictures and 26 tables, a total of 165 pages.


Keywords
data integration, web services, data as a service

Kampars, Jānis. On Demand Data Integration Solutions for Remote Data Sources. PhD Thesis. Rīga: [RTU], 2011. 165 p.

Publication language
Latvian (lv)
The Scientific Library of the Riga Technical University.
E-mail: uzzinas@rtu.lv; Phone: +371 28399196