Print Email Facebook Twitter Runtime Modifications of Spark Data Processing Pipelines Title Runtime Modifications of Spark Data Processing Pipelines Author Lazovik, E. Medema, M. Albers, T. Langius, E.A.F. Lazovik, A. Publication year 2017 Abstract Distributed data processing systems are the standard means for large-scale data analysis in the Big Data field. These systems are based on processing pipelines where the processing is done via a composition of multiple elements or steps. In current distributed data processing systems, the code and parameters that create the pipeline are set at design time, before the application starts processing any data. Any changes that have to be applied to the pipeline after it has been started, require the entire pipeline to be restarted. When a system needs to be operational 24/7 or has to respond in a timely fashion, restarting and having downtime is not acceptable. In this case, computing should be performed autonomously by the processing system that continuously takes the changes from the environment, and adjusts its processing steps, parameters, etc. on-the-fly. In this paper, we try to solve this problem by allowing changes to be made to a processing pipeline without restarting. We focus on two aspects of the problem: switching to another data source that is used as input, and changing the functional code and variables within the elements of a pipeline. Our system is built on top of Apache Spark, a framework widely used for distributed data processing. Subject Apache sparkAutonomic processingBig data applicationsDistributed data processingHeterogeneous data sourcesOn-the-fly updatesRuntime pipeline adaptationBig dataData handlingPipeline codesPipelinesHeterogeneous data sourcesOn the fliesRuntimesPipeline processing systems To reference this document use: http://resolver.tudelft.nl/uuid:a35d75d0-5adf-4afd-974b-205575f6deb1 TNO identifier 782452 ISBN 9781538619391 Source 4th IEEE International Conference on Cloud and Autonomic Computing, ICCAC 2017. 18 September 2017 through 22 September 2017, 34-45 Article number 8064052 Document type conference paper Files To receive the publication files, please send an e-mail request to TNO Library.