We used an unbiased bioinformatic approach to integrate data from 726 individual experiments from the ArrayExpress database; yielding a combined dataset of 24,029 microarrays. RMA processing was performed upon the data so as to allow the direct comparison of the separate experiments to each other. This dataset is being used to inform and direct biological experimentation into prostate cancer in the York Cancer Research Unit.