Are your data gathered? The Folding Test of Unimodality

Alban Siffer 1, 2 Pierre-Alain Fouque 1 Alexandre Termier 3 Christine Largouët 4, 3
1 EMSEC - EMbedded SEcurity and Cryptography
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
3 LACODAM - Large Scale Collaborative Data Mining
Inria Rennes – Bretagne Atlantique , IRISA_D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Understanding data distributions is one of the most fundamental research topic in data analysis. The literature provides a great deal of powerful statistical learning algorithms to gain knowledge on the underlying distribution given multivariate observations. We are likely to find out a dependence between features, the appearance of clusters or the presence of outliers. Before such deep investigations , we propose the folding test of unimodality. As a simple statistical description, it allows to detect whether data are gathered or not (unimodal or multimodal). To the best of our knowledge, this is the first multivariate and purely statistical unimodality test. It makes no distribution assumption and relies only on a straightforward p−value. Through real world data experiments, we show its relevance and how it could be useful for clustering.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01951676
Contributor : Alexandre Termier <>
Submitted on : Tuesday, December 11, 2018 - 4:22:21 PM
Last modification on : Thursday, February 7, 2019 - 3:37:46 PM
Long-term archiving on : Tuesday, March 12, 2019 - 3:36:07 PM

File

siffer_kdd18.pdf
Files produced by the author(s)

Identifiers

Citation

Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, Christine Largouët. Are your data gathered? The Folding Test of Unimodality. KDD 2018 - 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Minin, Aug 2018, London, United Kingdom. pp.2210-2218, ⟨10.1145/3219819.3219994⟩. ⟨hal-01951676⟩

Share

Metrics

Record views

95

Files downloads

134