Research Track "Data Integration"
PhD Candidate: George SiachamisTrack leader: Asterios Katsifodimos
The data integration track recognizes the importance of data for almost any application of artificial intelligence at ING. ING is a data-rich organization. Its data lake constitutes a federation of different data storage types. The relationships between the many different data sources evolve over time, and are hard to predict and manage.
The goal of this track is to use semantics-based data matching to recognize such data relationships automatically. In particular, we apply machine learning for the purpose of meta-data matching, automated schema discovery, schema evolution, and schema alignment. The results can be used to support data engineers to make data integration decisions by means of dataset exploration, discovery, and integration recommendation.
The context is the ING cloud infra-service platform, which continuously collects operational data from a large range of private cloud services operated by ING across layers.
Selected publications
-
G. Siachamis, K. Psarakis, M. Fragkoulis, Odysseas Papapetrou, A. van Deursen, A Katsifodimos (2023), Adaptive Distributed Streaming Similarity Joins, Marcelo Pasin (Eds.), In DEBS ‘23: Proceedings of the 17th ACM International Conference on Distributed and Event-based Systems p.25-36 (preprint).
-
George Siachamis, Job Kanis, Wybe Koper, Kyriakos Psarakis, Marios Fragkoulis, Arie Van Deursen, Asterios Katsifodimos (2023), Towards Evaluating Stream Processing Autoscalers, In Proceedings - 2023 IEEE 39th International Conference on Data Engineering Workshops, ICDEW 2023 p.95-99, Institute of Electrical and Electronics Engineers (IEEE) (preprint).
-
G. Siachamis, G.J.P.M. Houben, A. van Deursen, A Katsifodimos (2021), Integrating Massive Data Streams, Philip A. Bernstein , Tilmann Rabl (Eds.), In Proceedings of the VLDB 2021 PhD Workshop Volume 2971, CEUR-WS (preprint).
-
Christos Koutras, Kyriakos Psarakis, George Siachamis, Andra Ionescu, Marios Fragkoulis, Angela Bonifati, Asterios Katsifodimos (2021), Valentine in Action: Matching Tabular Data at Scale, In Proceedings of the VLDB Endowment Volume 14 p.2871–2874 (preprint and dataset).
-
Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, Asterios Katsifodimos (2021), Valentine: Evaluating Matching Techniques for Dataset Discovery, In Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021 p.468-479, IEEE (preprint and dataset).