Company: Textkernel, Machine intelligence for people and jobs
Location: Amsterdam, Netherlands
Period: Jan 2019 - Present
Mission
project leader, software engineer and machine learning engineer roles.
main developer and owner of the skills extraction and normalization service providing the company’s Skills API
involved in training and evaluating a skills-validation machine learning model
planned the release of new features and managed cross-team interactions while being fully involved in the development
profiled each steps of the process to detect performance bottlenecks and improve throughput
designed and implemented various processes around it
added an internal debug endpoint to explain in which conditions a certain skill can be extracted and which team to contact for which type of errors
automated a skills feedback pipeline: log all unknown skills passing through the service, gather monthly reports of unknown skills and export them to Jira
added logs to track usage and errors, visualize them in kibana dashboards for easier querying and debugging
Result: a microservice for extracting and validating skills in context, offered as a standalone product to customers, as well as being fully integrated in the company’s CV parsing and vacancy parsing products.
examples of other projects and tasks
implemented a parser for PDF LinkedIn profiles
helped improve the rendering and parsing of column CVs (annotated the split decision with Prodigy tool, designed a simple yet informative UI to help review the rendering differences between 2 preprocessor versions, profiled the new preprocessor, added heuristics to render contact information on top of the document)
brought all the microservices up to the company’s standard tech stack
created a standard client for the company’s upstream and internal services (check slow down and timeout limits on requests, handle retry policy, standard and user-friendly messages on errors)
led a taskforce to improve the performance and memory usage on microservices, to optimize resource consumption on k8s clusters
improved and standardized CI/CD pipelines of the R&D department by implementing generic templates (automated release and deployment pipelines, tracking microservice performance on changes, tracking parsing quality on resource updates, etc).
actively involved in maintaining and improving the company’s code base (refactoring, creation of common libraries, separation of concerns, documentation, etc.)
team player
part of the DevOps rotation schedule, including monitoring & firefighting activities
part of the support rotation schedule, answering questions, debugging and fixing systematic errors on the company’s CV parsing and vacancy parsing tools
adept of leading by example and sharing knowledge (Python profiling, GitLab templates, Kubernetes deployments, internal services, etc.)
Keywords: microservice, corpus building, data processing, word embeddings, classification model, CI/CD, automation, standard tech stack, breaking the monolith
Mission: This work focused on the automatic query correction. I defined a simple baseline meant to fix isolated non-word errors, using a spell-checker to generate low-distance candidates and a language model to re-rank the spelling corrections.
Mission: This work focused on the web page classification and on the image color classification. The web page content was mapped to a fixed-dimensional vector using TF-IDF and LSA. Both classification processes were performed with fully connected neural networks. I was charged with the entire machine learning workflow, from data acquisition, data processing up to the development of libraries and the deployment of code and models in production.
Keywords: corpus building, data processing, text classification, color classification, TF-IDF, LSA, neural networks
Company: Inria (French national research institute), Université de Lorraine
Location: Nancy, France
Period: Dec 2012 - Feb 2016
Project: RAPSODIE, Speech recognition as a communication aid for deaf and hearing impaired people
Mission
This work focused on the optimization of lexical models for a speech recognition system and on the extraction of para-lexical information from speech. The project’s objective was, at first, to build an embedded speech recognition system, meaning limited memory and computational power.
I studied the choice of lexical units defining the vocabulary and the associated n-gram language model: like phonemes, words or syllables. I finally proposed a new approach based on the combination of words and syllables in a hybrid language model. This kind of model aimed to ensure a proper recognition of the most frequent words and to offer sequences of syllables for speech segments corresponding to out-of-vocabulary words.
I also briefly worked on the similarity between words (defined by similar neighbor distributions) in order to add new words into a language model.
I studied the detection of questions and statements in order to inform the deaf and hearing impaired people when a question was addressed to them. I defined features related to the presence of interrogative words, to the likelihood ratio between two n-gram language models (trained on statements and on questions) and to the pronunciation at the end of the sentence. Several classifiers were evaluated: logistic regression, decision trees and shallow neural networks.
Keywords: speech recognition, hybrid language model, similar words, question detection
Support the integration of foreigners working at INRIA (LORIA)
Company: Inria, French national research institute
Location: Nancy, France
Period: Feb 2014 – Aug 2015
Mission: I organized meetings (guided visits of the old town) and activities (monthly board games sessions) in order to facilitate the integration of foreigners in the center. I answered practical and cultural questions. I assisted with administrative procedures (welcoming of new arrivals, presentations of the French tax system).
Junior Software Engineer
Company: Inria, French national research institute
Location: Nancy, France
Period: Oct 2011 - Dec 2012
Project: ALLEGRO, Speech recognition for second language learning
Mission: This work focused on the detection of incorrect entries (i.e. those for which the text does not correspond to the associated speech signal) of non-native speech in the context of foreign language learning. I exploited the comparison between two text-to-speech alignments: one constrained by the text which was being checked (forced alignment), with another one unconstrained, corresponding to a phonetic decoding (using a phoneme loop or a word loop). I combined several comparison criteria via a logistic regression function: the likelihood ratios, the use of phonemes and their duration, of phonetic classes and of non-speech units. This position revolved mainly around feature engineering (using domain knowledge) and performance analysis.
Technical environment: Perl, Shell Script, Gnuplot, LaTeX, Linux
Intern
University: Université de Lorraine
Location: Nancy, France
Period: Feb 2011 - Juin 2011
Title: Speech recognition with remote sound for a home automation system
Abstract: This work focused on the performance evaluation of a speech recognition system with remote sound. I tested different configurations, different decoding settings and different models (acoustic models and language models) in order to determine the setup leading to optimal performance.
Title: Acquisition and recognition of head movements for the gesture control in video games
Abstract: This work was based on the idea that users get emotionally involved while controlling the actions of video games though unconscious body movements; and these unconscious movements could be detected and used as the actual (natural) control of the game. I therefore developed a new way of tracking head movements which was able to learn the useful gestures to control a video game. The demonstration was performed using a Wii remote that tracked the head position through glasses equipped with IR LEDs.
Keywords: head movements, Wii remote, infrared sensors
Technical environment: C#, Microsoft Visual Studio, Windows