Title
Runtime Modifications of Spark Data Processing Pipelines
Author
Lazovik, E.
Medema, M.
Albers, T.
Langius, E.A.F.
Lazovik, A.
Publication year
2017
Abstract
Distributed data processing systems are the standard means for large-scale data analysis in the Big Data field. These systems are based on processing pipelines where the processing is done via a composition of multiple elements or steps. In current distributed data processing systems, the code and parameters that create the pipeline are set at design time, before the application starts processing any data. Any changes that have to be applied to the pipeline after it has been started, require the entire pipeline to be restarted. When a system needs to be operational 24/7 or has to respond in a timely fashion, restarting and having downtime is not acceptable. In this case, computing should be performed autonomously by the processing system that continuously takes the changes from the environment, and adjusts its processing steps, parameters, etc. on-the-fly. In this paper, we try to solve this problem by allowing changes to be made to a processing pipeline without restarting. We focus on two aspects of the problem: switching to another data source that is used as input, and changing the functional code and variables within the elements of a pipeline. Our system is built on top of Apache Spark, a framework widely used for distributed data processing.
Subject
Apache spark
Autonomic processing
Big data applications
Distributed data processing
Heterogeneous data sources
On-the-fly updates
Runtime pipeline adaptation
Big data
Data handling
Pipeline codes
Pipelines
Heterogeneous data sources
On the flies
Runtimes
Pipeline processing systems
To reference this document use:
http://resolver.tudelft.nl/uuid:a35d75d0-5adf-4afd-974b-205575f6deb1
TNO identifier
782452
ISBN
9781538619391
Source
4th IEEE International Conference on Cloud and Autonomic Computing, ICCAC 2017. 18 September 2017 through 22 September 2017, 34-45
Article number
8064052
Document type
conference paper