https://bioconductor.github.io/BiocWorkshops/r-and-bioconductor-for-everyone-an-introduction.html#introduction-to-bioconductor
使用valid()查看包安装的版本情况
> BiocManager::valid() * sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936 [3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] BiocManager_1.30.4 compiler_3.5.3 tools_3.5.3 yaml_2.2.0 Bioconductor version '3.8' * 45 packages out-of-date * 0 packages too new create a valid installation with BiocManager::install(c( "AnnotationFilter", "backports", "biomaRt", "Biostrings", "broom", "checkmate", "devtools", "digest", "dplyr", "DT", "ensembldb", "fs", "GenomicAlignments", "GenomicFeatures", "ggplot2", "ggrepel", "gridGraphics", "httpuv", "igraph", "knitr", "pillar", "processx", "progress", "ProtGenerics", "rcmdcheck", "RcppArmadillo", "remotes", "reprex", "rGREAT", "rlang", "robustbase", "rtracklayer", "rvest", "segmented", "shiny", "shinyFiles", "sys", "testthat", "tinytex", "usethis", "vegan", "WGCNA", "xfun", "xtable", "zip" ), update = TRUE, ask = FALSE) more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date Warning message: 45 packages out-of-date; 0 packages too newavaliable() 用于搜索相关的包
## 例如这里输入TxDb.Hsapiens,它会自动匹配相关的包 BiocManager::available("TxDb.Hsapiens") #> [1] "TxDb.Hsapiens.BioMart.igis" #> [2] "TxDb.Hsapiens.UCSC.hg18.knownGene" #> [3] "TxDb.Hsapiens.UCSC.hg19.knownGene" #> [4] "TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts" #> [5] "TxDb.Hsapiens.UCSC.hg38.knownGene"通过网站https://support.bioconductor.org/ 来查询解决和包相关的问题
rtracklayer包提供函数import() 帮助用户读取基因组格式的文件(例如:BED,GTF,VCF,FASTA) 并封装成Bioconductor下的对象。GenomicRanges包提供了多种函数,来在基因组坐标系下操纵各种数据。
注意 BED格式的文件,它们的坐标体系是0-based的,并且intervals是半开区间(start在范围内,end在范围后一个坐标里) 而Bioconductor得坐标是1-based,并且是闭区间(start和end坐标都包含在范围内),因此使用import()函数导入数据得时候,它会自动将bed文件坐标转换为Bioconductor文件对象的坐标。
GenomicRanges对象包含两个部分
必需的:seqnames(染色体号),start(起始位点),end(终止位点),strand (正链或负链)非必需的:另外的元素,例如本例中的name。必需的元素内容,可以使用函数start(), end(), width()获取。非必需的元素,使用$符号获取内容。
head( start(cpg) ) #> [1] 155188537 2226774 36306230 47708823 53737730 144179072 head( cpg$name ) ## 使用subset()函数获取染色体1号和2号上的cpg岛 subset(cpg, seqnames %in% c("chr1", "chr2"))可以使用findOverlaps(),nearest(),precede()和follow()函数,来完成overlaps
## count the number of CpG islands that overlap each transcript olaps <- countOverlaps(tx, cpg) length(olaps) # 1 count per transcript #> [1] 182435 table(olaps) ## 将计算的overlaps加入GR对象 tx$cpgOverlaps <- countOverlaps(tx, cpg) tx ## subsetting the GRanges objects for transcripts satisfying particular conditions, in a coordinated fashion ## where the software does all the book-keeping to makes sure the correct ranges are selected. subset(tx, cpgOverlaps > 10)