2009 UC San Diego Data Mining Contest

The winners have been identified; check the results tab for information. Also, we have reopened the site for continued participation.

Task 1: E-commerce Transaction Anomaly Classification (Easy)

The first task is a binary classification task that involves 19 features from web transaction anomaly data. The training data consist of 94,682 examples, and there are roughly fifty times as many negative examples as positive. The test set consists of 36,019 examples and is drawn from the same distribution as the train set.

You can submit answers on the submission page and see your test set scores immediately on the leaderboard page.

Submitting Answers

Your job is to create a text file containing one line per example in the test set. On each line, give your predicted probability the label is 1 (positive). The probability should be a decimal number between 0 and 1 (inclusive) with up to 6 decimal places of precision. So if you use all 6 decimal places, the format should be x.xxxxxx, where each x is an integer between 0 and 9.

Be careful when submitting because accidently deleting one line may have large repercussions.

Scoring Predictions

Evaluation Metric: The evaluation metric for the E-commerce Transaction Anomaly Classification (Easy) is lift at 20%. If S is the 20% of examples that you think are most likely to have a positive label, then the lift at 20% is proportional to the number of positive examples in S.

Download

The data are available on the download page.