Clustering the Ethereum Address Space

Users may be anonymous on the Ethereum blockchain, but their addresses are unique identifiers that leave publicly available transactional data. After building a dataset gathering this public data, I attempt to create meaningful categories of users (Miners, Exchanges,...) by dividing the Ethereum address space in clusters. A small percentage of labeled addresses in the dataset allows me to re-cluster the data to leverage this information, using a constrained version of the K-Means algorithm.



Generating EU legislation with GPT-2

Tutorial for generating EU legislative acts with OpenAI's GPT2. With just a few lines of code, I prepare the data, fine-tune a GPT-2 model and generate brand new content.



Understanding K-Means Clustering

Clustering algorithms are a wide range of techniques aiming to find subgroups in a dataset. Clustering models learn to assign labels to instances of the dataset: this is an unsupervised method.The goal is to group together instances that are most similar. Probably the simplest clustering algorithm to understand is the k-means clustering algorithm, which clusters the data into k number of clusters.



Pulsars Detection with HTRU2 Dataset

The HTRU2 dataset contains data about pulsars. As a classification task, I implement a few quick-and-dirty ML models before implementing ensemble models. CLustering with PCA and K-Means is then applied.