Machine-learned ranking functions have shown successes in Web search engines. With the increasing demands on developing effective ranking functions for different search domains, we have seen a big bottleneck, that is, the problem of insufficient labeled training data, which has significantly slowed the development and deployment of machine-learned ranking functions for different domains. There are two possible approaches to address this problem: (1) combining labeled training data from similar domains with the small target-domain labeled data for training or (2) using pairwise preference data extracted from user clickthrough log for the target domain for training. In this article, we propose a new approach called tree-based ranking function adaptation (Trada) to effectively utilize these data sources for training cross-domain ranking functions. Tree adaptation assumes that ranking functions are trained with the Stochastic Gradient Boosting Trees method—a gradient boosting method on regression trees. It takes such a ranking function from one domain and tunes its tree-based structure with a small amount of training data from the target domain. The unique features include (1) automatic identification of the part of the model that needs adjustment for the new domain and (2) appropriate weighing of training examples considering both local and global distributions. Based on a novel pairwise loss function that we developed for pairwise learning, the basic tree adaptation algorithm is also extended (Pairwise Trada) to utilize the pairwise preference data from the target domain to further improve the effectiveness of adaptation. Experiments are performed on real datasets to show that tree adaptation can provide better-quality ranking functions for a new domain than other methods.
Available at: http://works.bepress.com/keke_chen/21/