Partagez sur
STAGE Junior Data Scientist
Date de mise à jour de l’offre
VitaDX International :
With 400,000 new cases per year and a high recurrence rate, bladder cancer is one of the
most costly cancers for society.
VitaDX is a startup that develops a software solution for early and non-invasive diagnosis
of bladder cancer (one urine sample is all it takes). This diagnostic test is implemented
using machine learning algorithms that analyze urinary cytology whole slide images
(digitized microscope slides). These algorithms are currently being developed as part of
a clinical trial conducted by VitaDX with ten hospitals in France.
Description de la mission
Machine learning algorithms deployed in real-world situations often have lower
performances than the ones demonstrated during R&D. This can be due to a
combination of factors: the training dataset is not representative of all the possible
cases, overfitting of the evaluation dataset, the model is under-specified by the available
data, the real-world data has a distribution slightly different In medical applications, a
drop of performances can have dire consequences and should be mitigated as much as
possible. This internship goal is to tackle part of this problem: how to detect if a new
example is an anomaly and if a prediction should be made on it?
Anomaly (or outliers) detection is the identification of examples that do not fit the
distribution of the majority of the data. In our case, this would allow us to identify
samples that have been created using a different protocol and where the diagnosis
algorithm would not be meaningful. A large literature exists on the subject going from
unsupervised methods to fully-supervised ones.
Your tasks will be to: Make a complete bibliography of the subject Choose and implement state-of-the-art methods on toy data Test the chosen methods on VitaDX real problem
performances than the ones demonstrated during R&D. This can be due to a
combination of factors: the training dataset is not representative of all the possible
cases, overfitting of the evaluation dataset, the model is under-specified by the available
data, the real-world data has a distribution slightly different In medical applications, a
drop of performances can have dire consequences and should be mitigated as much as
possible. This internship goal is to tackle part of this problem: how to detect if a new
example is an anomaly and if a prediction should be made on it?
Anomaly (or outliers) detection is the identification of examples that do not fit the
distribution of the majority of the data. In our case, this would allow us to identify
samples that have been created using a different protocol and where the diagnosis
algorithm would not be meaningful. A large literature exists on the subject going from
unsupervised methods to fully-supervised ones.
Your tasks will be to: Make a complete bibliography of the subject Choose and implement state-of-the-art methods on toy data Test the chosen methods on VitaDX real problem
Profil recherché
Ideal candidate description Good Python coding skills. Familiar with standard Python libraries (conda, numpy, scikit-learn, …). Good knowledge of classical Machine Learning and Deep Learning algorithms. Highly motivated and autonomous. Interested by the medical field. Knowledge of Deep Learning frameworks such as TensorFlow or PyTorch is a plus.
Resources at your disposal
You will have access to all the slides generated by the protocol team, covering several
biological protocols and scanners.
You will be able to perform your computations either on your own machine with a GPU
or on our local computing cluster, which include several calculation nodes equipped with
recent Nvidia GPUs.
Resources at your disposal
You will have access to all the slides generated by the protocol team, covering several
biological protocols and scanners.
You will be able to perform your computations either on your own machine with a GPU
or on our local computing cluster, which include several calculation nodes equipped with
recent Nvidia GPUs.
Niveau de qualification requis
Bac + 4/5 et +
Les offres de stage ou de contrat sont définies par les recruteurs eux-mêmes.
En sa qualité d’hébergeur dans le cadre du dispositif des « 100 000 stages », la Région Île-de-France est soumise à un régime de responsabilité atténuée prévu aux articles 6.I.2 et suivants de la loi n°2204-575 du 21 juin 2004 sur la confiance dans l’économie numérique.
La Région Île-de-France ne saurait être tenue responsable du contenu des offres.
Néanmoins, si vous détectez une offre frauduleuse, abusive ou discriminatoire vous pouvez la signaler
en cliquant sur ce lien.
-
EmployeurVitaDX International
-
Secteur d’activité de la structureSanté - Social - Citoyenneté - Sécurité
-
Effectif de la structureDe 21 à 50 salariés
-
Site internet de la structurehttps://www.vitadx.com
-
Type de stage ou contratStage pour lycéens et étudiants en formation initiale
-
Date prévisionnelle de démarrage
-
Durée du stage ou contratPlus de 4 mois et jusqu'à 6 mois
-
Le stage est-il rémunéré ?Oui
-
Niveau de qualification requis
Bac + 4/5 et + -
Lieu du stage1er étage
28 Rue de Chambéry
75015 PARIS 15E ARRONDISSEMENT -
Accès et transportsTransport en commun