Skip to main content

Showing 1–50 of 118 results for author: De Cristofaro, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05524  [pdf, ps, other

    cs.CR

    PROTEAN: Federated Intrusion Detection in Non-IID Environments through Prototype-Based Knowledge Sharing

    Authors: Sara Chennoufi, Yufei Han, Gregory Blanc, Emiliano De Cristofaro, Christophe Kiennert

    Abstract: In distributed networks, participants often face diverse and fast-evolving cyberattacks. This makes techniques based on Federated Learning (FL) a promising mitigation strategy. By only exchanging model updates, FL participants can collaboratively build detection models without revealing sensitive information, e.g., network structures or security postures. However, the effectiveness of FL solutions… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Journal ref: Published in the Proceedings of the 30th European Symposium on Research in Computer Security (ESORICS 2025)

  2. arXiv:2506.16666  [pdf, ps, other

    cs.CR cs.LG

    The Hitchhiker's Guide to Efficient, End-to-End, and Tight DP Auditing

    Authors: Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Georgios Kaissis, Emiliano De Cristofaro

    Abstract: This paper systematizes research on auditing Differential Privacy (DP) techniques, aiming to identify key insights into the current state of the art and open challenges. First, we introduce a comprehensive framework for reviewing work in the field and establish three cross-contextual desiderata that DP audits should target--namely, efficiency, end-to-end-ness, and tightness. Then, we systematize t… ▽ More

    Submitted 30 June, 2025; v1 submitted 19 June, 2025; originally announced June 2025.

  3. arXiv:2506.14191  [pdf

    cs.CY

    The Ethics of Generative AI in Anonymous Spaces: A Case Study of 4chan's /pol/ Board

    Authors: Parth Gaba, Emiliano De Cristofaro

    Abstract: This paper presents a characterization of AI-generated images shared on 4chan, examining how this anonymous online community is (mis-)using generative image technologies. Through a methodical data collection process, we gathered 900 images from 4chan's /pol/ (Politically Incorrect) board, which included the label "/mwg/" (memetic warfare general), between April and July 2024, identifying 66 unique… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  4. arXiv:2504.08254  [pdf, other

    cs.CR cs.LG

    Understanding the Impact of Data Domain Extraction on Synthetic Data Privacy

    Authors: Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, Emiliano De Cristofaro

    Abstract: Privacy attacks, particularly membership inference attacks (MIAs), are widely used to assess the privacy of generative models for tabular synthetic data, including those with Differential Privacy (DP) guarantees. These attacks often exploit outliers, which are especially vulnerable due to their position at the boundaries of the data domain (e.g., at the minimum and maximum values). However, the ro… ▽ More

    Submitted 13 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted to the Synthetic Data x Data Access Problem workshop (SynthData), part of ICLR 2025

  5. arXiv:2504.06923  [pdf, other

    cs.CR cs.LG

    The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data

    Authors: Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, Emiliano De Cristofaro

    Abstract: Differentially Private (DP) generative marginal models are often used in the wild to release synthetic tabular datasets in lieu of sensitive data while providing formal privacy guarantees. These models approximate low-dimensional marginals or query workloads; crucially, they require the training data to be pre-discretized, i.e., continuous values need to first be partitioned into bins. However, as… ▽ More

    Submitted 13 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  6. arXiv:2503.14772  [pdf, other

    cs.SI

    VIKI: Systematic Cross-Platform Profile Inference of Online Users

    Authors: Ben Treves, Emiliano De Cristofaro, Yue Dong, Michalis Faloutsos

    Abstract: What can we learn about online users by comparing their profiles across different platforms? We use the term profile to represent displayed personality traits, interests, and behavioral patterns (e.g., offensiveness). We also use the term {\it displayed personas} to refer to the personas that users manifest on a platform. Though individuals have a single real persona, it is not difficult to imagin… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: Published in the Proceedings of the 17th ACM Web Science Conference (WebSci 2025). Please cite the WebSci version

  7. arXiv:2502.01608  [pdf, other

    cs.CR cs.HC

    Beyond the Crawl: Unmasking Browser Fingerprinting in Real User Interactions

    Authors: Meenatchi Sundaram Muthu Selva Annamalai, Igor Bilogrevic, Emiliano De Cristofaro

    Abstract: Browser fingerprinting is a pervasive online tracking technique used increasingly often for profiling and targeted advertising. Prior research on the prevalence of fingerprinting heavily relied on automated web crawls, which inherently struggle to replicate the nuances of human-computer interactions. This raises concerns about the accuracy of current understandings of real-world fingerprinting dep… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: A slightly shorter version of this paper appears in the Proceedings of the 34th "The Web Conference'' (WWW 2025). Please cite the WWW version

  8. arXiv:2411.10614  [pdf, other

    cs.CR cs.LG

    To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

    Authors: Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Emiliano De Cristofaro

    Abstract: The Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm allows the training of machine learning (ML) models with formal Differential Privacy (DP) guarantees. Since DP-SGD processes training data in batches, it employs Poisson sub-sampling to select each batch at every step. However, it has become common practice to replace sub-sampling with shuffling owing to better compatibility… ▽ More

    Submitted 12 April, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

  9. arXiv:2410.17390  [pdf, other

    cs.SI

    Revealing The Secret Power: How Algorithms Can Influence Content Visibility on Social Media

    Authors: Alessandro Galeazzi, Pujan Paudel, Mauro Conti, Emiliano De Cristofaro, Gianluca Stringhini

    Abstract: In recent years, the opaque design and the limited public understanding of social networks' recommendation algorithms have raised concerns about potential manipulation of information exposure. While reducing content visibility, aka shadow banning, may help limit harmful content, it can also be used to suppress dissenting voices. This prompts the need for greater transparency and a better understan… ▽ More

    Submitted 24 April, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

  10. arXiv:2406.13985  [pdf, other

    cs.LG cs.CR

    The Elusive Pursuit of Reproducing PATE-GAN: Benchmarking, Auditing, Debugging

    Authors: Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro

    Abstract: Synthetic data created by differentially private (DP) generative models is increasingly used in real-world settings. In this context, PATE-GAN has emerged as one of the most popular algorithms, combining Generative Adversarial Networks (GANs) with the private training approach of PATE (Private Aggregation of Teacher Ensembles). In this paper, we set out to reproduce the utility evaluation from t… ▽ More

    Submitted 10 February, 2025; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Published in Transactions on Machine Learning Research (TMLR 2025). Please cite the TMLR version

  11. arXiv:2405.16682  [pdf, other

    cs.LG cs.CL cs.CR

    A Systematic Review of Federated Generative Models

    Authors: Ashkan Vedadi Gargary, Emiliano De Cristofaro

    Abstract: Federated Learning (FL) has emerged as a solution for distributed systems that allow clients to train models on their data and only share models instead of local data. Generative Models are designed to learn the distribution of a dataset and generate new data samples that are similar to the original data. Many prior works have tried proposing Federated Generative Models. Using Federated Learning a… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 24 Pages, 3 Figures, 5 Tables

  12. arXiv:2405.14106  [pdf, other

    cs.CR cs.LG

    Nearly Tight Black-Box Auditing of Differentially Private Machine Learning

    Authors: Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro

    Abstract: This paper presents an auditing procedure for the Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm in the black-box threat model that is substantially tighter than prior work. The main intuition is to craft worst-case initial model parameters, as DP-SGD's privacy analysis is agnostic to the choice of the initial model parameters. For models trained on MNIST and CIFAR-10 at the… ▽ More

    Submitted 1 November, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: To appear in the Proceedings of the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024). Please cite accordingly

  13. arXiv:2405.10994  [pdf, other

    cs.CR

    "What do you want from theory alone?" Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation

    Authors: Meenatchi Sundaram Muthu Selva Annamalai, Georgi Ganev, Emiliano De Cristofaro

    Abstract: Differentially private synthetic data generation (DP-SDG) algorithms are used to release datasets that are structurally and statistically similar to sensitive data while providing formal bounds on the information they leak. However, bugs in algorithms and implementations may cause the actual information leakage to be higher. This prompts the need to verify whether the theoretical guarantees of sta… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: To appear at Usenix Security 2024

  14. arXiv:2405.10233  [pdf, other

    cs.SI cs.CY cs.IR

    iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023

    Authors: Jay Patel, Pujan Paudel, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn

    Abstract: Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  15. arXiv:2401.13248  [pdf, other

    cs.CY cs.SI

    "Here's Your Evidence": False Consensus in Public Twitter Discussions of COVID-19 Science

    Authors: Alexandros Efstratiou, Marina Efstratiou, Satrio Yudhoatmojo, Jeremy Blackburn, Emiliano De Cristofaro

    Abstract: The COVID-19 pandemic brought about an extraordinary rate of scientific papers on the topic that were discussed among the general public, although often in biased or misinformed ways. In this paper, we present a mixed-methods analysis aimed at examining whether public discussions were commensurate with the scientific consensus on several COVID-19 issues. We estimate scientific consensus based on s… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted for publication at 27th ACM Conference on Computer Supported Cooperative Work and Social Computing (ACM CSCW 2024). Please cite accordingly

  16. arXiv:2312.08394  [pdf, other

    cs.CR cs.CY cs.SI

    From HODL to MOON: Understanding Community Evolution, Emotional Dynamics, and Price Interplay in the Cryptocurrency Ecosystem

    Authors: Kostantinos Papadamou, Jay Patel, Jeremy Blackburn, Philipp Jovanovic, Emiliano De Cristofaro

    Abstract: This paper presents a large-scale analysis of the cryptocurrency community on Reddit, shedding light on the intricate relationship between the evolution of their activity, emotional dynamics, and price movements. We analyze over 130M posts on 122 cryptocurrency-related subreddits using temporal analysis, statistical modeling, and emotion detection. While /r/CryptoCurrency and /r/dogecoin are the m… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  17. arXiv:2312.05114  [pdf, other

    cs.CR cs.AI cs.LG

    The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets

    Authors: Georgi Ganev, Emiliano De Cristofaro

    Abstract: Generative models producing synthetic data are meant to provide a privacy-friendly approach to releasing data. However, their privacy guarantees are only considered robust when models satisfy Differential Privacy (DP). Alas, this is not a ubiquitous standard, as many leading companies (and, in fact, research papers) use ad-hoc privacy metrics based on testing the statistical similarity between syn… ▽ More

    Submitted 7 May, 2025; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Published in the Proceedings of the 46th IEEE Symposium on Security & Privacy (IEEE S&P 2025). Please cite the S&P version

  18. arXiv:2311.16940  [pdf, other

    cs.CR cs.CY

    FP-Fed: Privacy-Preserving Federated Detection of Browser Fingerprinting

    Authors: Meenatchi Sundaram Muthu Selva Annamalai, Igor Bilogrevic, Emiliano De Cristofaro

    Abstract: Browser fingerprinting often provides an attractive alternative to third-party cookies for tracking users across the web. In fact, the increasing restrictions on third-party cookies placed by common web browsers and recent regulations like the GDPR may accelerate the transition. To counter browser fingerprinting, previous work proposed several techniques to detect its prevalence and severity. Howe… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Journal ref: Published in the Proceedings of the 31st Network and Distributed System Security Symposium (NDSS 2024), please cite accordingly

  19. arXiv:2308.05247  [pdf, other

    cs.SI cs.CR

    TUBERAIDER: Attributing Coordinated Hate Attacks on YouTube Videos to their Source Communities

    Authors: Mohammad Hammas Saeed, Kostantinos Papadamou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini

    Abstract: Alas, coordinated hate attacks, or raids, are becoming increasingly common online. In a nutshell, these are perpetrated by a group of aggressors who organize and coordinate operations on a platform (e.g., 4chan) to target victims on another community (e.g., YouTube). In this paper, we focus on attributing raids to their source community, paving the way for moderation approaches that take the conte… ▽ More

    Submitted 22 June, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at the 18th International AAAI Conference on Web and Social Media (ICWSM 2024). Please cite accordingly

  20. arXiv:2305.10994  [pdf, other

    cs.LG cs.CR

    Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility

    Authors: Georgi Ganev, Kai Xu, Emiliano De Cristofaro

    Abstract: Generative models trained with Differential Privacy (DP) can produce synthetic data while reducing privacy risks. However, navigating their privacy-utility tradeoffs makes finding the best models for specific settings/tasks challenging. This paper bridges this gap by profiling how DP generative models for tabular data distribute privacy budgets across rows and columns, which is one of the primary… ▽ More

    Submitted 28 August, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: A shorter version of this paper appears in the Proceedings of the 31st ACM Conference on Computer and Communications Security (ACM CCS 2024). This is the full version

  21. arXiv:2304.08847  [pdf, other

    cs.LG cs.CR

    BadVFL: Backdoor Attacks in Vertical Federated Learning

    Authors: Mohammad Naseri, Yufei Han, Emiliano De Cristofaro

    Abstract: Federated learning (FL) enables multiple parties to collaboratively train a machine learning model without sharing their data; rather, they train their own model locally and send updates to a central server for aggregation. Depending on how the data is distributed among the participants, FL can be classified into Horizontal (HFL) and Vertical (VFL). In VFL, the participants share the same set of t… ▽ More

    Submitted 23 August, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted for publication at the 45th IEEE Symposium on Security & Privacy (S&P 2024). Please cite accordingly

  22. arXiv:2303.07099  [pdf, other

    cs.CY cs.SI

    Beyond Fish and Bicycles: Exploring the Varieties of Online Women's Ideological Spaces

    Authors: Utkucan Balci, Chen Ling, Emiliano De Cristofaro, Megan Squire, Gianluca Stringhini, Jeremy Blackburn

    Abstract: The Internet has been instrumental in connecting under-represented and vulnerable groups of people. Platforms built to foster social interaction and engagement have enabled historically disenfranchised groups to have a voice. One such vulnerable group is women. In this paper, we explore the diversity in online women's ideological spaces using a multi-dimensional approach. We perform a large-scale,… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Journal ref: Published in the Proceedings of the 15th ACM Web Science Conference 2023 (ACM WebSci 2023). Please cite the WebSci version

  23. arXiv:2303.01230  [pdf, other

    cs.CR cs.AI cs.CY

    Synthetic Data: Methods, Use Cases, and Risks

    Authors: Emiliano De Cristofaro

    Abstract: Sharing data can often enable compelling applications and analytics. However, more often than not, valuable datasets contain information of a sensitive nature, and thus, sharing them can endanger the privacy of users and organizations. A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead. The idea is to release artificially generate… ▽ More

    Submitted 27 February, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: To Appear in IEEE Security and Privacy Magazine

  24. arXiv:2212.05926  [pdf, other

    cs.CR cs.CY cs.SI

    LAMBRETTA: Learning to Rank for Twitter Soft Moderation

    Authors: Pujan Paudel, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, Gianluca Stringhini

    Abstract: To curb the problem of false information, social media platforms like Twitter started adding warning labels to content discussing debunked narratives, with the goal of providing more context to their audiences. Unfortunately, these labels are not applied uniformly and leave large amounts of false content unmoderated. This paper presents LAMBRETTA, a system that automatically identifies tweets that… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: 44th IEEE Symposium on Security & Privacy (S&P 2023)

  25. arXiv:2211.14388  [pdf, other

    cs.CY cs.SI

    Non-Polar Opposites: Analyzing the Relationship Between Echo Chambers and Hostile Intergroup Interactions on Reddit

    Authors: Alexandros Efstratiou, Jeremy Blackburn, Tristan Caulfield, Gianluca Stringhini, Savvas Zannettou, Emiliano De Cristofaro

    Abstract: Previous research has documented the existence of both online echo chambers and hostile intergroup interactions. In this paper, we explore the relationship between these two phenomena by studying the activity of 5.97M Reddit users and 421M comments posted over 13 years. We examine whether users who are more engaged in echo chambers are more hostile when they comment on other communities. We then c… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Journal ref: 17th International AAAI Conference on Web and Social Media (ICWSM 2023). Please cite accordingly

  26. arXiv:2209.03463  [pdf, other

    cs.CY cs.AI cs.CR cs.SI

    Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

    Authors: Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang

    Abstract: Chatbots are used in many applications, e.g., automated agents, smart home assistants, interactive characters in online games, etc. Therefore, it is crucial to ensure they do not behave in undesired manners, providing offensive or toxic responses to users. This is not a trivial task as state-of-the-art chatbot models are trained on large, public datasets openly collected from the Internet. This pa… ▽ More

    Submitted 9 September, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Journal ref: Published in ACM CCS 2022. Please cite the CCS version

  27. arXiv:2209.03050  [pdf, other

    cs.CR cs.AI

    Cerberus: Exploring Federated Prediction of Security Events

    Authors: Mohammad Naseri, Yufei Han, Enrico Mariconti, Yun Shen, Gianluca Stringhini, Emiliano De Cristofaro

    Abstract: Modern defenses against cyberattacks increasingly rely on proactive approaches, e.g., to predict the adversary's next actions based on past events. Building accurate prediction models requires knowledge from many organizations; alas, this entails disclosing sensitive information, such as network structures, security postures, and policies, which might often be undesirable or outright impossible. I… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Journal ref: Proceedings of the 29th ACM Conference on Computer and Communications Security (ACM CCS 2022)

  28. arXiv:2206.15237  [pdf, other

    cs.CY cs.SI

    Adherence to Misinformation on Social Media Through Socio-Cognitive and Group-Based Processes

    Authors: Alexandros Efstratiou, Emiliano De Cristofaro

    Abstract: Previous work suggests that people's preference for different kinds of information depends on more than just accuracy. This could happen because the messages contained within different pieces of information may either be well-liked or repulsive. Whereas factual information must often convey uncomfortable truths, misinformation can have little regard for veracity and leverage psychological processe… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Journal ref: 25th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW 2022)

  29. arXiv:2204.12709  [pdf, other

    cs.CY cs.NI

    Toxicity in the Decentralized Web and the Potential for Model Sharing

    Authors: Haris Bin Zia, Aravindh. Raman, Ignacio Castro, Ishaku Hassan Anaobi, Emiliano De Cristofaro, Nishanth Sastry, Gareth Tyson

    Abstract: The "Decentralised Web" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is cha… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Journal ref: Published in the Proceedings of the 2022 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'22). Please cite accordingly

  30. arXiv:2202.08492  [pdf, other

    cs.CY cs.CV

    Feels Bad Man: Dissecting Automated Hateful Meme Detection Through the Lens of Facebook's Challenge

    Authors: Catherine Jennifer, Fatemeh Tahmasbi, Jeremy Blackburn, Gianluca Stringhini, Savvas Zannettou, Emiliano De Cristofaro

    Abstract: Internet memes have become a dominant method of communication; at the same time, however, they are also increasingly being used to advocate extremism and foster derogatory beliefs. Nonetheless, we do not have a firm understanding as to which perceptual aspects of memes cause this phenomenon. In this work, we assess the efficacy of current state-of-the-art multimodal machine learning models toward… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

  31. arXiv:2112.00443  [pdf, other

    cs.CR cs.CY cs.SI

    TROLLMAGNIFIER: Detecting State-Sponsored Troll Accounts on Reddit

    Authors: Mohammad Hammas Saeed, Shiza Ali, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, Gianluca Stringhini

    Abstract: Growing evidence points to recurring influence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In this paper, we present TROLLMAGNIF… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  32. arXiv:2111.02455  [pdf, other

    cs.DL cs.SI

    Understanding the Use of e-Prints on Reddit and 4chan's Politically Incorrect Board

    Authors: Satrio Baskoro Yudhoatmojo, Emiliano De Cristofaro, Jeremy Blackburn

    Abstract: The dissemination and reach of scientific knowledge have increased at a blistering pace. In this context, e-Print servers have played a central role by providing scientists with a rapid and open mechanism for disseminating research without waiting for the (lengthy) peer review process. While helping the scientific community in several ways, e-Print servers also provide scientific communicators and… ▽ More

    Submitted 8 March, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

    Journal ref: Published in the Proceedings of the 15th ACM Web Science Conference 2023 (ACM WebSci 2023). Please cite the WebSci version

  33. arXiv:2111.02452  [pdf, other

    cs.CY cs.CV

    Slapping Cats, Bopping Heads, and Oreo Shakes: Understanding Indicators of Virality in TikTok Short Videos

    Authors: Chen Ling, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini

    Abstract: Short videos have become one of the leading media used by younger generations to express themselves online and thus a driving force in shaping online culture. In this context, TikTok has emerged as a platform where viral videos are often posted first. In this paper, we study what elements of short videos posted on TikTok contribute to their virality. We apply a mixed-method approach to develop a c… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  34. arXiv:2111.02187  [pdf, other

    cs.SI cs.CY

    Soros, Child Sacrifices, and 5G: Understanding the Spread of Conspiracy Theories on Web Communities

    Authors: Pujan Paudel, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, Gianluca Stringhini

    Abstract: This paper presents a multi-platform computational pipeline geared to identify social media posts discussing (known) conspiracy theories. We use 189 conspiracy claims collected by Snopes, and find 66k posts and 277k comments on Reddit, and 379k tweets discussing them. Then, we study how conspiracies are discussed on different Web communities and which ones are particularly influential in driving t… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  35. arXiv:2110.13500  [pdf, other

    cs.CY

    Exploring Content Moderation in the Decentralised Web: The Pleroma Case

    Authors: Anaobi Ishaku Hassan, Aravindh Raman, Ignacio Castro, Haris Bin Zia, Emiliano De Cristofaro, Nishanth Sastry, Gareth Tyson

    Abstract: Decentralising the Web is a desirable but challenging goal. One particular challenge is achieving decentralised content moderation in the face of various adversaries (e.g. trolls). To overcome this challenge, many Decentralised Web (DW) implementations rely on federation policies. Administrators use these policies to create rules that ban or modify content that matches specific rules. This, howeve… ▽ More

    Submitted 30 October, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

    Journal ref: Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies (ACM CoNext 2021)

  36. arXiv:2109.11429  [pdf, other

    cs.LG cs.AI cs.CR cs.CY

    Robin Hood and Matthew Effects: Differential Privacy Has Disparate Impact on Synthetic Data

    Authors: Georgi Ganev, Bristena Oprisanu, Emiliano De Cristofaro

    Abstract: Generative models trained with Differential Privacy (DP) can be used to generate synthetic data while minimizing privacy risks. We analyze the impact of DP on these models vis-a-vis underrepresented classes/subgroups of data, specifically, studying: 1) the size of classes/subgroups in the synthetic data, and 2) the accuracy of classification tasks run on them. We also evaluate the effect of variou… ▽ More

    Submitted 26 June, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning (ICML 2022)

  37. arXiv:2108.05876  [pdf, other

    cs.CY cs.SI

    An Early Look at the Gettr Social Network

    Authors: Pujan Paudel, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, Gianluca Stringhini

    Abstract: This paper presents the first data-driven analysis of Gettr, a new social network platform launched by former US President Donald Trump's team. Among other things, we find that users on the platform heavily discuss politics, with a focus on the Trump campaign in the US and Bolsonaro's in Brazil. Activity on the platform has steadily been decreasing since its launch, although a core of verified use… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

  38. arXiv:2104.11145  [pdf, other

    cs.CY

    "I'm a Professor, which isn't usually a dangerous job": Internet-Facilitated Harassment and its Impact on Researchers

    Authors: Periwinkle Doerfler, Andrea Forte, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn, Damon McCoy

    Abstract: While the Internet has dramatically increased the exposure that research can receive, it has also facilitated harassment against scholars. To understand the impact that these attacks can have on the work of researchers, we perform a series of systematic interviews with researchers including academics, journalists, and activists, who have experienced targeted, Internet-facilitated harassment. We pr… ▽ More

    Submitted 22 April, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

  39. arXiv:2103.03631  [pdf, other

    cs.CY cs.SI

    A Multi-Platform Analysis of Political News Discussion and Sharing on Web Communities

    Authors: Yuping Wang, Savvas Zannettou, Jeremy Blackburn, Barry Bradlyn, Emiliano De Cristofaro, Gianluca Stringhini

    Abstract: The news ecosystem has become increasingly complex, encompassing a wide range of sources with varying levels of trustworthiness, and with public commentary giving different spins to the same stories. In this paper, we present a multi-platform measurement of this ecosystem. We compile a list of 1,073 news websites and extract posts from four Web communities (Twitter, Reddit, 4chan, and Gab) that co… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

  40. arXiv:2102.03314  [pdf, other

    q-bio.GN cs.AI cs.CR

    On Utility and Privacy in Synthetic Genomic Data

    Authors: Bristena Oprisanu, Georgi Ganev, Emiliano De Cristofaro

    Abstract: The availability of genomic data is essential to progress in biomedical research, personalized medicine, etc. However, its extreme sensitivity makes it problematic, if not outright impossible, to publish or share it. As a result, several initiatives have been launched to experiment with synthetic genomic data, e.g., using generative models to learn the underlying distribution of the real data and… ▽ More

    Submitted 18 January, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

    Comments: Published in the Proceedings of the 29th Network and Distributed System Security Symposium (NDSS 2022)

  41. arXiv:2102.02551  [pdf, other

    cs.CR cs.AI cs.LG stat.ML

    ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models

    Authors: Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang

    Abstract: Inference attacks against Machine Learning (ML) models allow adversaries to learn sensitive information about training data, model parameters, etc. While researchers have studied, in depth, several kinds of attacks, they have done so in isolation. As a result, we lack a comprehensive picture of the risks caused by the attacks, e.g., the different scenarios they can be applied to, the common factor… ▽ More

    Submitted 6 October, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

  42. arXiv:2101.08750  [pdf, other

    cs.CY cs.SI

    The Gospel According to Q: Understanding the QAnon Conspiracy from the Perspective of Canonical Information

    Authors: Antonis Papasavva, Max Aliapoulios, Cameron Ballard, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Jeremy Blackburn

    Abstract: The QAnon conspiracy theory claims that a cabal of (literally) blood-thirsty politicians and media personalities are engaged in a war to destroy society. By interpreting cryptic "drops" of information from an anonymous insider calling themself Q, adherents of the conspiracy theory believe that Donald Trump is leading them in an active fight against this cabal. QAnon has been covered extensively by… ▽ More

    Submitted 29 April, 2022; v1 submitted 21 January, 2021; originally announced January 2021.

    Journal ref: Published in the Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM 2022). Please cite accordingly

  43. arXiv:2101.06535  [pdf, other

    cs.HC cs.CY cs.SI

    Dissecting the Meme Magic: Understanding Indicators of Virality in Image Memes

    Authors: Chen Ling, Ihab AbuHilal, Jeremy Blackburn, Emiliano De Cristofaro, Savvas Zannettou, Gianluca Stringhini

    Abstract: Despite the increasingly important role played by image memes, we do not yet have a solid understanding of the elements that might make a meme go viral on social media. In this paper, we investigate what visual elements distinguish image memes that are highly viral on social media from those that do not get re-shared, across three dimensions: composition, subjects, and target audience. Drawing fro… ▽ More

    Submitted 16 January, 2021; originally announced January 2021.

    Comments: To appear at the 24th ACM Conference on Computer-Supported Coop- erative Work and Social Computing (CSCW 2021)

  44. arXiv:2101.03820  [pdf, other

    cs.SI cs.CY physics.soc-ph

    An Early Look at the Parler Online Social Network

    Authors: Max Aliapoulios, Emmi Bevensee, Jeremy Blackburn, Barry Bradlyn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou

    Abstract: Parler is as an "alternative" social network promoting itself as a service that allows to "speak freely and express yourself openly, without fear of being deplatformed for your views." Because of this promise, the platform become popular among users who were suspended on mainstream social networks for violating their terms of service, as well as those fearing censorship. In particular, the service… ▽ More

    Submitted 18 February, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Journal ref: Proceedings of the International AAAI Conference on Web and Social Media, 15(1), 943--951 (2021)

  45. arXiv:2010.11638  [pdf, other

    cs.CY cs.SI

    "It is just a flu": Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations

    Authors: Kostantinos Papadamou, Savvas Zannettou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Michael Sirivianos

    Abstract: The role played by YouTube's recommendation algorithm in unwittingly promoting misinformation and conspiracy theories is not entirely understood. Yet, this can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, such as the COVID-19 pandemic. In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTu… ▽ More

    Submitted 12 October, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: To appear at the 16th International Conference on Web and Social Media (ICWSM 2022). Please cite the ICWSM version

  46. Do Platform Migrations Compromise Content Moderation? Evidence from r/The_Donald and r/Incels

    Authors: Manoel Horta Ribeiro, Shagun Jhaver, Savvas Zannettou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Robert West

    Abstract: When toxic online communities on mainstream platforms face moderation measures, such as bans, they may migrate to other platforms with laxer policies or set up their own dedicated websites. Previous work suggests that within mainstream platforms, community-level moderation is effective in mitigating the harm caused by the moderated communities. It is, however, unclear whether these results also ho… ▽ More

    Submitted 20 August, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: This paper has been accepted at CSCW 2021, please cite accordingly

  47. arXiv:2009.11792  [pdf, other

    cs.CY

    Understanding the Use of Fauxtography on Social Media

    Authors: Yuping Wang, Fatemeh Tahmasbi, Jeremy Blackburn, Barry Bradlyn, Emiliano De Cristofaro, David Magerman, Savvas Zannettou, Gianluca Stringhini

    Abstract: Despite the influence that image-based communication has on online discourse, the role played by images in disinformation is still not well understood. In this paper, we present the first large-scale study of fauxtography, analyzing the use of manipulated or misleading images in news discussion on online communities. First, we develop a computational pipeline geared to detect fauxtography, and ide… ▽ More

    Submitted 25 September, 2020; v1 submitted 24 September, 2020; originally announced September 2020.

  48. arXiv:2009.04885  [pdf, other

    cs.CY

    "Is it a Qoincidence?": An Exploratory Study of QAnon on Voat

    Authors: Antonis Papasavva, Jeremy Blackburn, Gianluca Stringhini, Savvas Zannettou, Emiliano De Cristofaro

    Abstract: Online fringe communities offer fertile grounds for users seeking and sharing ideas fueling suspicion of mainstream news and conspiracy theories. Among these, the QAnon conspiracy theory emerged in 2017 on 4chan, broadly supporting the idea that powerful politicians, aristocrats, and celebrities are closely engaged in a global pedophile ring. Simultaneously, governments are thought to be controlle… ▽ More

    Submitted 14 February, 2021; v1 submitted 10 September, 2020; originally announced September 2020.

    Journal ref: Published in the Proceedings of 30th The Web Conference (WWW 2021). Please cite the WWW version

  49. arXiv:2009.03561  [pdf, other

    cs.CR cs.AI

    Local and Central Differential Privacy for Robustness and Privacy in Federated Learning

    Authors: Mohammad Naseri, Jamie Hayes, Emiliano De Cristofaro

    Abstract: Federated Learning (FL) allows multiple participants to train machine learning models collaboratively by keeping their datasets local while only exchanging model updates. Alas, this is not necessarily free from privacy and robustness vulnerabilities, e.g., via membership, property, and backdoor attacks. This paper investigates whether and to what extent one can use differential Privacy (DP) to pro… ▽ More

    Submitted 27 May, 2022; v1 submitted 8 September, 2020; originally announced September 2020.

    Journal ref: Published in the Proceedings of the 29th Network and Distributed System Security Symposium (NDSS 2022)

  50. arXiv:2005.08679  [pdf, other

    cs.LG cs.AI cs.CR cs.CY stat.ML

    An Overview of Privacy in Machine Learning

    Authors: Emiliano De Cristofaro

    Abstract: Over the past few years, providers such as Google, Microsoft, and Amazon have started to provide customers with access to software interfaces allowing them to easily embed machine learning tasks into their applications. Overall, organizations can now use Machine Learning as a Service (MLaaS) engines to outsource complex tasks, e.g., training classifiers, performing predictions, clustering, etc. Th… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.