2009 UC San Diego Data Mining Contest

The winners have been identified; check the results tab for information. Also, we have reopened the site for continued participation.

(Note that the data sets are different for the two tasks.)

E-commerce Transaction Anomaly Classification (Easy)

The following sets of data are availabe for the E-commerce Transaction Anomaly Classification (Easy). They are .zip files, so use Winzip on Windows or unzip (not gunzip) on Linux.

Data Set
(.zip files)
File Size
Number of Examples Summary
Training Data 1 MB 94,682 One line per example, feature values are comma delimited. First line has feature names.
Training Labels 3 KB 94,682 One line per example, parallel to the training data file. 1 means the corresponding training example is positive, 0 means the corresponding training example is negative.
Test Data 1 MB 36,019 Same format as training data.

E-commerce Transaction Anomaly Classification (Hard)

The following sets of data are availabe for the E-commerce Transaction Anomaly Classification (Hard). They are .zip files, so use Winzip on Windows or unzip (not gunzip) on Linux.

Data Set
(.zip files)
File Size
Number of Examples Summary
Training Data 3 MB 100,000 One line per example, feature values are comma delimited. First line has feature names.
Training Labels 5 KB 100,000 One line per example, parallel to the training data file. 1 means the corresponding training example is positive, 0 means the corresponding training example is negative.
Test Data 1 MB 50,000 Same format as training data.