A HADOOP-Based Framework for Parallel and Distributed Feature Selection

Research output: Other contribution



Publication details

DatePublished - 1 Sep 2013
TypeTechnical Report
Place of PublicationDepartment of Computer Science, University of York, UK
VolumeTechnical Report YCS-2013-485
Original languageEnglish


In this paper, we introduce a theoretical basis for a Hadoop-based framework for parallel and distributed feature selection. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and
weaknesses. We present the implementation details of four feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop MapReduce. Hadoop allows parallel and distributed processing so each feature selector can be processed in parallel and multiple feature selectors can be processed together in parallel allowing multiple feature selectors to be
compared. We identify commonalities among the four features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all four feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.

    Research areas

  • Hadoop; , Distributed; , Binary Neural Network, Parallel; , Data Fusion; , Feature Selection;

Discover related content

Find related publications, people, projects, datasets and more using interactive charts.

View graph of relations