测序数据在经过序列比对、排序等操作后,想要计算一下本次测序的深度和覆盖度是怎样的,这里有很多方法可以做到。

1、

samtools depth bamfile | awk {sum+=$3} END { print "Average = "sum/NR}

2、

samtools depth *bamfile* | awk {sum+=$3; sumsq+=$3*$3} END { print "Average = ",sum/NR; print "Stdev = ",sqrt(sumsq/NR - (sum/NR)**2)}

3、使用R语言

library(Rsamtools)
bamcoverage <- function (bamfile) {
# read in the bam file
bam <- scanBam(bamfile)[[1]] # the result comes in nested lists
# filter reads without match position
ind <- ! is.na(bam$pos)
## remove non-matches, they are not relevant to us
bam <- lapply(bam, function(x) x[ind])
ranges <- IRanges(start=bam$pos, width=bam$qwidth, names=make.names(bam$qname, unique=TRUE))
## names of the bam data frame:
## "qname" "flag" "rname" "strand" "pos" "qwidth"
## "mapq" "cigar" "mrnm" "mpos" "isize" "seq" "qual"
## construc: genomic ranges object containing all reads
ranges <- GRanges(seqnames=Rle(bam$rname), ranges=ranges, strand=Rle(bam$strand), flag=bam$flag, readid=bam$rname )
## returns a coverage for each reference sequence (aka. chromosome) in the bam file
return (mean(coverage(ranges)))
}

4、利用samtools depth

samtools depth deduped_MA605.bam > deduped_MA605.coverage
Chr position depth (this header will be absent though)
1 3980 66
1 3981 68
1 3982 67
1 3983 67
1 3984 68

awk $1 == 1 {print $0} deduped_MA605.coverage > chr1_MA605.coverage

5、如果想要计算Coverage 至少在 10x、20x、30x 的覆盖度分别为多少

bedtools genomecov -d -ibam your_bam.bam -g your_genome.fa > genome_cov.txt

或者

qualimap bamqc -bam you_bam_file.bam -outfile report.pdf -outformat PDF

推荐阅读:

相关文章