In this paper, a novel methodology to reduce the generalization errors occurring due to domain shift in big data classification is presented. This reduction is achieved by introducing a suitably selected domain shift to the training data via what is referred to as "distortion model". These distortions are introduced through an affine transformation and additional data-samples are obtained. Next, a deep neural network (NN), referred as "classifier", is used to classify both the original and the additional data samples. By learning from both the original and additional data-samples, the classifier compensates for the domain shift while maintaining its performance on original data. However, as the exact magnitude of the shift one would encounter in real applications is unknown a priori and difficult to predict. The objective is to compensate for the optimal shift that can be introduced by the distortion model without significantly degrading the performance of the model. A two-player zero-sum game is thus designed where the first player is the distortion model with the aim of increasing the domain shift. The classifier then becomes the second player whose aim is to minimize the impact of domain shift. Finally, a direct error-driven learning scheme is utilized to minimize the impact of the classifier while maximizing the domain shift. A comprehensive simulation study is presented where a 12% improvement in the presence of domain shift is demonstrated. The proposed approach is also shown to improve generalization by 6%.
- Big data,
- Deep neural networks,
- Metadata,
- Affine transformations,
- Data classification,
- Error-driven learning,
- Generalization Error,
- Minimax approach,
- Novel methodology,
- Real applications,
- Simulation studies,
- Classification (of information)
Available at: http://works.bepress.com/jagannathan-sarangapani/232/
This research was supported in part by an NSF I/UCRC award IIP 1134721 and Intelligent Systems Center.