eHarmony matchings

This data set was provided by eHarmony, Inc. The data consists of pairs of individuals, which either matched (positive example) or did not (negative example). The data is partitioned into two subsets corresponding to two equal-length segments of time. The data is stored in CSV files, organized as follows.

EH-*-data.csv.gz
Each row describes an individual. The first column is an identification number for that individual, and all subsequent columns contain the (numeric) feature values.
EH-*-labels.csv.gz
Each row describes a pairwise interaction. The first column indicates whether the interaction is positive (1) or negative (0). The second and third columns contain identification numbers for the corresponding individuals.

Please refer to the paper below for more details about this data set.

Note

To protect the privacy of users, all features have been obfuscated and normalized. I cannot provide names for the features.

Download

References

If you use this data, please cite the following paper:
2010
bib | pdf
Metric learning to rank
Twenty-seventh International Conference on Machine Learning (ICML).

Source code

The source code for MLR is now hosted on GitHub.