【生信进阶练习1000days】day2-学习summarized experimental data与Down stream analysis

xiaoxiao2022-07-03 187

学习章节

https://bioconductor.github.io/BiocWorkshops/r-and-bioconductor-for-everyone-an-introduction.html#working-with-summarized-experimental-data

文章目录

学习章节1. Working with summarized experimental data1.1 简介1.2 构建SummarizedExperiment对象1.3 Assay data1.4 Creating a SummarizedExperiment object 2. 下游分析 Down-stream analysis

1. Working with summarized experimental data

1.1 简介

本章主要学习SummatizedExperiment包和SummarizedExperiment对象 SummarizedExperiment对象具有类似于矩阵的性质，我们可以通过行和列，对它取子集。来自于SummarizedExperiment对象实验的数据assay()，它的行代表我们感兴趣的特征(例如基因)，列代表每个样本，（矩阵中的每个值可能代表每个基因的在不同样本中的表达量)

1.2 构建SummarizedExperiment对象

数据介绍包含有8个样本，数据由RNA-seq实验产生，主要是用于观察4个人的平滑肌细胞系对地塞米松治疗的情况我们可以使用函数browseVignettes("airway")查看关于这个数据集和实验的详细描述

## input data fname <- file.choose() # airway_colData.csv fname ## set the first column of the data to be treated as row names(将第一列作为数据的row-names) colData <- read.csv(fname, row.names = 1) colData

这组数据来源于Short Read Archive，包含SampleName,Run,Experiment,Sampel,BioSample这些列，另外我们还需要添加以下的列：

Cell:所使用的细胞系，本数据使用了4个细胞系dex:这个样本是否添加了地塞米松albut:二次治疗，我们可以忽略avgLength:本次实验中，每个样本的RNA-seq的reads的平均长度

1.3 Assay data

现在导入assay数据

## importing the assay data from the file “airway_counts.csv” fname <- file.choose() # airway_counts.csv fname counts <- read.csv(fname, row.names=1) ## coerce data.frame() to matrix using as.matrix() counts <- as.matrix(counts) ## We see the dimensions and first few rows of the counts matrix dim(counts) #> [1] 33469 8 head(counts)

数据解释

以基因ENSG00000000003为例，样本SRR1039508 有679 个reads，覆盖了它；样本SRR1039509 有448个reads覆盖了它。

1.4 Creating a SummarizedExperiment object

## Attach the SummarizedExperiment library to our R session library("SummarizedExperiment") ## Use the SummarizedExperiment() function to coordinate the assay and column data ## 校准数据 se <- SummarizedExperiment(assay = counts, colData = colData) se ## use subset() on SummarizedExperiment to create subsets of the data in a coordinated way ## 取出数据中的子集，注意由于SummarizedExperiment是个二维矩阵，所以我们对他的操作也是基于二维的 subset(se, , dex == "trt") ## use assay() to extract the count matrix, ## colSums() to calculate the library size (total number of reads overlapping genes in each sample) ## colSums()计算每个样本中覆盖了所有基因的reads总数 colSums(assay(se)) ## se$lib.size <- colSums(assay(se)) colData(se)

2. 下游分析 Down-stream analysis

使用R包DESeq2来进行下游分析

## Down-stream analysis library("DESeq2") ## including cell line as a covariate, ## and dexamethazone treatment as the main factor that we are interested in ## 构建dds数据集 dds <- DESeqDataSet(se, design = ~ cell + dex) dds ## performs advanced statistical analysis on the data in the dds object ## 进行统计分析 dds <- DESeq(dds) ## A table summarizing measures of differential expression can be extracted from the object ## 使用results查看差异分析结果 results(dds)

最新回复(0)