Journal of Information and Data Management, Vol 2, No 1 (2011)

Font Size:  Small  Medium  Large

Summary-based Comparison of Data Quality across Public MAGE-ML Genomic Datasets

Lorena Etcheverry, Mariano P. Consens

Abstract


Extensive microarray experimental data is available online, facilitating independent evaluation of experiment
conclusions and enabling reuse. Numerous microarray experiment datasets are published using the MAGE-ML
XML schema but assessing the quality of published experiments still represents a challenging task since there is no
consensus among microarray users on a framework to measure datasets quality.
In this paper, we apply techniques based on DescribeX that quantitatively and qualitatively analyze MAGE-ML
public collections, gaining insights about schema evolution. Our case study shows that DescribeX is a useful tool for
the evaluation of microarray experiment data quality that enhances the understanding of the instance-level structure of
MAGE-ML datasets and its evolution.

Full Text: PDF

An official publication of the Brazilian Computer Society Special Interest Group on Databases.