Print Email Facebook Twitter Template Clustering for the Foundational Analysis of the Dark Web Title Template Clustering for the Foundational Analysis of the Dark Web Author Nair, V.V. van Staalduinen, M. Oosterman, D.T. Publication year 2021 Abstract The rapid rise of the Dark Web and supportive technologies has served as the backbone facilitating online illegal activity worldwide. These illegal activities supported by anonymisation technologies such as Tor has made it increasingly elusive to law enforcement agencies. Despite several successful law enforcement operations, illegal activity on the Dark Web is still growing. There are approaches to monitor, mine, and research the Dark Web, all with varying degrees of success. Given the complexity and dynamics of the services offered, we recognize the need for in depth analysis of the Dark Web with regard to its infrastructures, actors, types of abuse and their relationships. This involves the challenging task of information extraction from the very heterogeneous collection of web pages that make up the Dark Web. Most providers develop their services on top of standard frameworks such as WordPress, Simple Machine Forum, phpBB and several other frameworks to deploy their services. As a result, these service providers publish significant number of pages based on similar structural and stylistic templates. We propose an efficient, scalable, repeatable and accurate approach to cluster Dark Web pages based on those structural and stylistic features. Extracting relevant information from those clusters should make it feasible to conduct in depth Dark Web analysis. This paper presents our clustering algorithm to accelerate information extraction, and as a result improve attribution of digital traces to infrastructures or individuals in the fight against cyber crime. Subject Artificial IntelligenceCybercrimeMachine LearningTemplate ClusteringWeb MiningClustering algorithmsComputer crimeCrimeCybersecurityData miningInformation retrievalMachine learningNetwork securityClusteringsCyber-crimesDark webIllegal activitiesMachine-learningSupportive technologiesTemplate clusteringWeb technologiesWeb-pageWebsites To reference this document use: http://resolver.tudelft.nl/uuid:81639d7b-7b9b-4643-9a52-c5f5b0b3a12b TNO identifier 966033 Publisher Institute of Electrical and Electronics Engineers Inc. ISBN 9781665439022 ISSN 0000-0000 Source Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, 2021 IEEE International Conference on Big Data, Big Data 2021, 15 December 2021 through 18 December 2021, 2542-2549 Bibliographical note Sponsor: Ankura Collaboration Drives Results;IEEE;IEEE Computer Society;Lyve Cloud;NSF;Seagate Document type conference paper Files To receive the publication files, please send an e-mail request to TNO Library.