A repository of the projects I worked on or am currently working on. Click on the projects' titles to see the full analysis and code.
You can also check out the code on Github.
- The OpenAI GPT-2 uses a transformer-based language model to write impressively coherent and passionate essays. Using GPT2-simple, I fine-tuned the model on all European Union's Directives, Regulations and Decisions to get generated new EU legislative acts.
- I then put the model in a Docker container and depolyed it with Google Cloud Run.
- The generated texts are surpisingly coherent and can produce some quirky use of legalese.
- You can read more about the process and the results in my article on my blog.
- You can generate some text here.
- I scraped all 181 Baudelaire's poems from poesie-francaise.fr using BeautifulSoup.
- Inspired by the famous Karpathy's article, I used a charRNN model with Tensorflow 2 and Keras to generate Baudelaire-like poems.
- On the Ethereum blockchain, addresses are unique identifiers that leave traces as publicly available transactional data.
- I built a dataset around Ethereum addresses and 28 relevant features from multiple sources: Google BigQuery dataset, etherscan.io public API, labels from a Kaggle dataset as well as manually added labels.
- I attempt to create meaningful categories of users (Miners, Exchanges,...) by using K-Means clustering algorithm.
- A small percentage of labeled addresses in the dataset allows me to re-cluster the data to leverage this information, using a constrained version of the K-Means algorithm.
- I scraped 45,000 tweets with Tweepy and preprocessed them.
- Word cloud, sentiment analysis with NLTK and exploratory analysis of the data.
- A visual introduction and to the K-Means algorithm.
- For a more visually pleasing experience, you can find my article here.
- The HTRU2 Pulsars dataset contains data about pulsars. I first use the dataset as a binary classification problem, and as an opportunity to try different classification algorithms and compare their performance.
- Then, I use the dataset for unsupervised learning tasks, namely by using a clustering method (K-Means) with PCA as a precursor step.
- The QuickDraw dataset contains 50 millions of drawings collected by Google.
- I select 12 categories from the dataset (only animals) and train this dataset on a CNN.
- The data comes from a Kaggle competition.
- I train a CNN to recognize 15 keypoints on faces.
- This Kaggle competition is a regression problem: we predict the price of houses based on more than 80 features (and many missing values). This gives us interesting possibilities for feature transformation and data visualization.