\name{intCNGEan.match}
\alias{intCNGEan.match}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{ Genomic location matching of CN and GE data}
\description{
Integrative CN-GE analysis requires the copy number data of
all genes on the expression array to be available.
\code{intCNGEan.match} matches the features of the copy number platform
to the genes of the expression array.
This is done using their genomic locations
on the basis of either proximity or overlap.
}
\usage{
intCNGEan.match(CNdata, GEdata, CNbpend = "no", GEbpend = "no", method = "distance")
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{CNdata}{ Object of class \code{cghCall}, containing (among others) annotion and call probabilities. }
  \item{GEdata}{ Object of class \code{ExpressionSet}. }
  \item{CNbpend}{ Indicator of the availability of end base pair information for features of the copy number platform, either \code{"no"} or \code{"yes"}. }
  \item{GEbpend}{ Indicator of the availability of end base pair information for features of the gene expression platform, either \code{"no"} or \code{"yes"}. }
  \item{method}{ Matching method to be applied, either \code{"distance"}, \code{"overlap"} or \code{"overlapplus"}. See below for detials. }
  \item{mergeGainAmp}{ Boolean indicating whether gain and amplification probabilities should be merged, either \code{TRUE} or \code{FALSE}. }
}
\details{
It is assumed that annotation information matrix of both copy number and gene expression data,
as provided by \code{fData(CNdata)} and \code{fData(GEdata)}, contain chromosome number (1st 
column, named \code{Chr}), start base pair (2nd column, named \code{Start}) and end base pair
(3rd column, named \code{End}).
Base pair information of copy number and expression data should be on the same scale. Addition columns provided are ignored in the matching. 

Matching occurs on the basis of genomic locations. In case \code{method="distance"}, the midpoint of CN and GE features are calculated and for each gene on the
expression array the closest feature of the copy number platform is selected. If \code{method="overlap"}, each gene in the \code{ExpressionSet}-object
is matched to the feature from the copy number platform with the maximum percentage of overlap. If the maximum percentage of overlap equals zero, the gene is not included in the 
matched objects. If \code{method="overlapplus"}, the features are first matched by their percentage of overlap (as with the \code{method="overlap"}-option). For all non-matched GE features
its closest two CN features (one down- and one upstream) are determined. If the copy number signature of these two CN features is identical, intrapolation seems reasonable, and
and the GE feature is matched to the closest of these two CN features. Hence, \code{method="overlapplus"} makes use of the copy number data, consequently, matching may be different for different data sets. 
}
\value{
  A \code{list}-object with the following entries:
  \item{CNdata.matched }{Object of class \code{cghCall} with matched features.}
  \item{GEdata.matched }{Object of class \code{ExpressionSet} with matched features.}
}
\references{ 
Van Wieringen, W.N., Van de Wiel, M.A. (2009), "Non-parametric testing for DNA copy number 
induced differential mRNA gene expression", \emph{Biometrics}, 65(1), 19-29. 

Van Wieringen, W.N., Belien, J.A.M., Vosse, S.J., Achame, E.M., Ylstra, B. (2006), "ACE-it: a tool for genome-wide integration of gene dosage and RNA expression data", \emph{Bioinformatics}, 22(15), 1919-1920.
}
\author{ Wessel N. van Wieringen: \email{wvanwie@few.vu.nl} }
\note{ 
The matching process implemented here is different from the one
implemented in the \code{ACEit}-package (Van Wieringen et al., 2006).
}
\section{Warning}{
Features with incomplete annotation information are removed before matching. For clarity, they are not included in the objects with matched features.
}

\seealso{ \code{cghCall}, \code{ExpressionSet}, \code{intCNGEan.tune}, \code{intCNGEan.test}, \code{intCNGEan.plot} }
\examples{
# load data
data(pollackCN)
data(pollackGE)

# match features from both platforms
CNGEdataMatched <- intCNGEan.match(pollackCN, pollackGE)
}
