Scientific Contribution For Data quality assessment metrics for Machine Learning process-v2
Owner
Contributors
Contact person:
-
Faouzi ADJED
Description
The machine learning is fully depending on the data, then consequently the data quality is important in the life cycle of models based on machine learning. The current work presents a review of dominant approaches in the literature used to evaluate the data quality in case of machine learning. It encompasses four different approaches evaluating dataset, which are diversity, representativeness, completeness and coverage. It also presents the approaches used for qualitative evaluation. In addition to that, it also describes the importance of data-centric research direction which helps the machine learning decision mastering. Then, we discussed the impact of the data in the machine learning performances and the limitations of the approaches presented in this work. This delivrable contributes to enhance the machine learning life cycle and it makes clear the ability of connecting needs, data collection and models with the ultimate objective of machine learning decision mastering.