Adrien SIEG

Hi, I'm Adrien, a Cloud-oriented Data Scientist with an interest in AI (or BI)-powered applications and Data Science.

I am focusing on business-oriented applications of data-science and willing to put data intelligence everywhere into day-to-day business routines.

I'm majoring in Statistics and Economics.

From Paris, France

Cheese Enthusiast, Rock-climber.

Lived in London and Singapore

Feel free to contact me for any suggestions, ideas, or if you have some datasets to share.

See my work

General Statistics Visualization AWS-Cloud

Estimate the degree of similarity between two texts

Compare the most popular ways of computing sentence similarity and investigate how they perform.

FROM Static Word Embedding TO Dynamic (Contextualized) Word Embedding

Language modeling is the task of assigning a probability distribution over sequences of words that matches the distribution of a language.

Disrupting Customer-Centric Journey via a data-driven strategy

Improving the customer journey and providing a positive customer experience (CX) was ranked as the number one trend, as well as top strategic priority, in the survey of global banking leaders for the 2017 Retail Banking Trends and Predictions report.

Network Graph of Word Embeddings [Part 1]

How network embedding graphs can help to solve major open problems in natural language understanding - introduction

Network Graph of Word Embeddings - Node2Vec and implementation on Neo4j via Cypher [Part 2]

Node2Vec creates vector representation for nodes in a network when Word2Vec and Doc2Vec creates vector representations for words in a corpus of text. How to implement two different Neo4j graph databases

Neo4j Graph Database and Python

Load csv file into Neo4j and request it

Distributed Computing & (Py)Spark

MapReduce / Hadoop / tf-idf and PageRank / Spark / Distributed Datasets / SparkSQL / AWS / Data Storage on S3 / Manage a cluster on cloud computing / Cloud Computing / PySpark

A Full free text Search Engine built from ElasticSearch and Flask

Implementing a search feature so that users can find interesting posts using natural language / NLP / [Bonus] : Kibana - load data (e.g csv, txt, ...) via Python, R or Logstash - Queries data straightforward via R or Python [Coming soon]

Access real-time data via Kafka and Storm

Case study based on JC-Decaux API to manage Bicycles pool in near real time [Coming soon]

Canonical Correlation Analysis [French]

Statistical way of inferring information from cross-covariance matrices

Spectral Clustering [French]

Spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions.

Risk Visualization in d3js [RECOVERY]

How risk landscape has changed from 2015 to 2017

Parisian trees

Parisian trees in keeping with house prices - the lower the number of trees, the higher house prices

Les bases du Networking pour le Cloud

Réseau, adresses IP, IPv4 vs. IPv6, masque de sous-réseau, CIDR block, Supernetting, Subnetting, VLSM, ...

Create AWS baby datalakes to handle ongoing daily data batch

Build a Data Lake Foundation mainly with AWS Glue and Amazon S3

PySpark Tutorial for Beginners

Significant features of PySpark

adriensieg@hotmail.fr

Ring the bell to say hello or to contact me!