Experience

Machine learning Engineer

Company: Textkernel, Machine intelligence for people and jobs
Location: Amsterdam, Netherlands
Period: Jan 2019 - Present
Mission
- project leader, software engineer and machine learning engineer roles.
- main developer and owner of the skills extraction and normalization service providing the company’s Skills API
  - involved in training and evaluating a skills-validation machine learning model
  - planned the release of new features and managed cross-team interactions while being fully involved in the development
  - profiled each steps of the process to detect performance bottlenecks and improve throughput
  - designed and implemented various processes around it
    - added an internal debug endpoint to explain in which conditions a certain skill can be extracted and which team to contact for which type of errors
    - automated a skills feedback pipeline: log all unknown skills passing through the service, gather monthly reports of unknown skills and export them to Jira
    - added logs to track usage and errors, visualize them in kibana dashboards for easier querying and debugging
  - Result: a microservice for extracting and validating skills in context, offered as a standalone product to customers, as well as being fully integrated in the company’s CV parsing and vacancy parsing products.
- examples of other projects and tasks
  - implemented a parser for PDF LinkedIn profiles
  - helped improve the rendering and parsing of column CVs (annotated the split decision with Prodigy tool, designed a simple yet informative UI to help review the rendering differences between 2 preprocessor versions, profiled the new preprocessor, added heuristics to render contact information on top of the document)
  - brought all the microservices up to the company’s standard tech stack
  - created a standard client for the company’s upstream and internal services (check slow down and timeout limits on requests, handle retry policy, standard and user-friendly messages on errors)
  - led a taskforce to improve the performance and memory usage on microservices, to optimize resource consumption on k8s clusters
  - improved and standardized CI/CD pipelines of the R&D department by implementing generic templates (automated release and deployment pipelines, tracking microservice performance on changes, tracking parsing quality on resource updates, etc).
  - actively involved in maintaining and improving the company’s code base (refactoring, creation of common libraries, separation of concerns, documentation, etc.)
- team player
  - part of the DevOps rotation schedule, including monitoring & firefighting activities
  - part of the support rotation schedule, answering questions, debugging and fixing systematic errors on the company’s CV parsing and vacancy parsing tools
  - adept of leading by example and sharing knowledge (Python profiling, GitLab templates, Kubernetes deployments, internal services, etc.)
Keywords: microservice, corpus building, data processing, word embeddings, classification model, CI/CD, automation, standard tech stack, breaking the monolith
Technical environment: Python, Perl, Shell Script, Elasticsearch, Kibana, Git, GitLab, Jenkins, Docker, Kubernetes, GitOps

Machine Learning Engineer

Company: Qwant, European search engine
Location: Epinal, France
Period: Nov 2017 - Aug 2018
Mission: This work focused on the automatic query correction. I defined a simple baseline meant to fix isolated non-word errors, using a spell-checker to generate low-distance candidates and a language model to re-rank the spelling corrections.
Keywords: minimum edit distance, language model
Technical environment: Python (spaCy, fasttext, hunspell, symspell, PyNLPl), SRILM, Shell Script, Git, GitLab, Docker, Linux
Source code: ccquery

Machine Learning Engineer

Company: Xilopix, French search engine
Location: Epinal, France
Period: Oct 2016 - Nov 2017
Mission: This work focused on the web page classification and on the image color classification. The web page content was mapped to a fixed-dimensional vector using TF-IDF and LSA. Both classification processes were performed with fully connected neural networks. I was charged with the entire machine learning workflow, from data acquisition, data processing up to the development of libraries and the deployment of code and models in production.
Keywords: corpus building, data processing, text classification, color classification, TF-IDF, LSA, neural networks
Technical environment: Ruby (rmagick), Python (gensim, sklearn, matplotlib), Shell Script, Git, Gerrit, Docker, ElasticSearch, Linux
Source code: xi-ml-topicdiscovery, xi-dip

PhD student

Company: Inria (French national research institute), Université de Lorraine
Location: Nancy, France
Period: Dec 2012 - Feb 2016
Project: RAPSODIE, Speech recognition as a communication aid for deaf and hearing impaired people
Mission
- This work focused on the optimization of lexical models for a speech recognition system and on the extraction of para-lexical information from speech. The project’s objective was, at first, to build an embedded speech recognition system, meaning limited memory and computational power.
- I studied the choice of lexical units defining the vocabulary and the associated n-gram language model: like phonemes, words or syllables. I finally proposed a new approach based on the combination of words and syllables in a hybrid language model. This kind of model aimed to ensure a proper recognition of the most frequent words and to offer sequences of syllables for speech segments corresponding to out-of-vocabulary words.
- I also briefly worked on the similarity between words (defined by similar neighbor distributions) in order to add new words into a language model.
- I studied the detection of questions and statements in order to inform the deaf and hearing impaired people when a question was addressed to them. I defined features related to the presence of interrogative words, to the likelihood ratio between two n-gram language models (trained on statements and on questions) and to the pronunciation at the end of the sentence. Several classifiers were evaluated: logistic regression, decision trees and shallow neural networks.
Keywords: speech recognition, hybrid language model, similar words, question detection
Technical environment: Perl, Java (Weka), SRILM, CMU Sphinx, Shell Script, Gnuplot, LaTeX, Git, distributed computing platform, Linux

Support the integration of foreigners working at INRIA (LORIA)

Company: Inria, French national research institute
Location: Nancy, France
Period: Feb 2014 – Aug 2015
Mission: I organized meetings (guided visits of the old town) and activities (monthly board games sessions) in order to facilitate the integration of foreigners in the center. I answered practical and cultural questions. I assisted with administrative procedures (welcoming of new arrivals, presentations of the French tax system).

Junior Software Engineer

Company: Inria, French national research institute
Location: Nancy, France
Period: Oct 2011 - Dec 2012
Project: ALLEGRO, Speech recognition for second language learning
Mission: This work focused on the detection of incorrect entries (i.e. those for which the text does not correspond to the associated speech signal) of non-native speech in the context of foreign language learning. I exploited the comparison between two text-to-speech alignments: one constrained by the text which was being checked (forced alignment), with another one unconstrained, corresponding to a phonetic decoding (using a phoneme loop or a word loop). I combined several comparison criteria via a logistic regression function: the likelihood ratios, the use of phonemes and their duration, of phonetic classes and of non-speech units. This position revolved mainly around feature engineering (using domain knowledge) and performance analysis.
Keywords: speech recognition, incorrect entries, non-native speech, constrained and unconstrained alignments, logistic regression
Technical environment: Perl, Shell Script, Gnuplot, LaTeX, Linux

Intern

University: Université de Lorraine
Location: Nancy, France
Period: Feb 2011 - Juin 2011
Title: Speech recognition with remote sound for a home automation system
Abstract: This work focused on the performance evaluation of a speech recognition system with remote sound. I tested different configurations, different decoding settings and different models (acoustic models and language models) in order to determine the setup leading to optimal performance.
Keywords: speech recognition, remote sound, adaptation, optimal settings
Technical environment: Java, Perl, SRILM, CMU Sphinx, HTK, Shell Script, Linux

Intern

University: Universitatea ‘Stefan cel Mare’
Location: Suceava, Romania
Period: Feb 2010 - Juin 2010
Title: Acquisition and recognition of head movements for the gesture control in video games
Abstract: This work was based on the idea that users get emotionally involved while controlling the actions of video games though unconscious body movements; and these unconscious movements could be detected and used as the actual (natural) control of the game. I therefore developed a new way of tracking head movements which was able to learn the useful gestures to control a video game. The demonstration was performed using a Wii remote that tracked the head position through glasses equipped with IR LEDs.
Keywords: head movements, Wii remote, infrared sensors
Technical environment: C#, Microsoft Visual Studio, Windows

detection of head position with a Wii remote

Experience

Luiza Sarzyniec

Machine learning Engineer

Machine Learning Engineer

Machine Learning Engineer

PhD student

Support the integration of foreigners working at INRIA (LORIA)

Junior Software Engineer

Intern

Intern