CNA-seq / Correct for GC content

Description

Takes the counts per bin for a CNA-seq data set and corrects them for GC content and mappability.

Parameters

Details

Correcting for GC content is necessary because it affects enzyme chemistry and therefore also the depth of sequencing coverage. Mappability represents the uniqueness of sequences in the genome and also needs to be corrected for.

Output

The input data set with raw counts replaced by corrected ones. The data is also log2-transformed.

References

Scheinin et al. (2014) DNA copy number analysis of fresh and formalin-fixed specimens by whole-genome sequencing: improved correction of systematic biases and exclusion of problematic regions. Manuscript submitted.

GC content: Benjamini and Speed (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40: e72

Mappability: Koehler et al. (2011) The uniqueome: a mappability resource for short-tag sequencing. Bioinformatics 27: 272-274