Published March 5, 2024 | Version V2
Scientific contribution Restricted

Scientific Contribution For Data quality assessment metrics for Machine Learning process-v2

Contributors

Contact person:

  • Faouzi ADJED

Description

The machine learning is fully depending on the data, then consequently the data quality is important in the life cycle of models based on machine learning. The current work presents a review of dominant approaches in the literature used to evaluate the data quality in case of machine learning. It encompasses four different approaches evaluating dataset, which are diversity, representativeness, completeness and coverage. It also presents the approaches used for qualitative evaluation. In addition to that, it also describes the importance of data-centric research direction which helps the machine learning decision mastering. Then, we discussed the impact of the data in the machine learning performances and the limitations of the approaches presented in this work. This delivrable contributes to enhance the machine learning life cycle and it makes clear the ability of connecting needs, data collection and models with the ultimate objective of machine learning decision mastering.

Files

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Trustworthy Attributes
Reliability
Integrity
Use cases
Vision
Functional Set
Data Life cycle
Model Component Life Cycle