This Thesis was conducted during my studies at HEC Montreal. Here I showcase my work in a PDF format that you can download.
I decided to pursue a masters degree with Thesis because I was very curious about machine learning research and social issues. I believed that the only way to satisfy my curiosity was to do a Thesis. I think I learned more technical skills doing my thesis than than taking classes and doing a short project. Here I worked with BIG data, python, Graphs, Mila servers and unsupervised learning techniques. I had the opportunity to work on a paper (you can look at the publication here paper) alongside a Phd student who specializes in Graphs and we had the honor of presenting our work at WebSci 2023 WebSci23.
In this thesis by articles, we present a research paper that we submitted for theWebSci’23 conference and is now under review. In addition to the article itself, in the thesis, we provide further detail regarding the motivation, background, literature review and research. The aim of this thesis is to provide a method that can facilitate the work of individuals combating online human trafficking. The majority of trafficking victims report being advertised online, this explains why online sex trafficking has been on the rise in the past few years. On the other hand, the use of OnlyFans as a platform for adult content has increased exponentially in the past three years, and Twitter has been its main advertising tool. Since we know that traffickers usually work within a network and control multiple victims, we suspect that there may be networks of traffickers promoting multiple OnlyFans accounts belonging to their victims. Based on these observations, we decided to conduct the first tstudy looking at organized activities on Twitter through OnlyFans advertisements. Preliminary analysis of this space shows that most tweets related to OnlyFans contains generic text, making text-based methods less reliable. Instead, focusing on what ties the authors of these tweets together, we propose a novel method for uncovering coordinated networks of users based on their behaviour. Our method, called Multi-Level Clustering (MLC), combines two levels of clustering. In the first level, we detect communities based on username Mentions and shared URLs, while the second level is done through two different approaches: i- the Partial Intersections (PI) of URLs and Mention communities ii- Joint Clustering (JT) by applying a subraph dense detection algorithm. We additionally successfully proved that our JT approach applied on synthetically generated data (with injected ground truth) shows a superior performance compared to competitive baselines. Furthermore, we apply the MLC to real-world data of tweets pertaining to OnlyFans and analyse the detected groups and show that our Partial Intersections provides good quality clusters (high entropy of OnlyFans accounts). Our paper and our thesis end with a discussion where we show carefully chosen examples of organized clusters and provide multiple interesting points that supports our method.
Here below is my full Thesis