1 Overview
2 Preparations
3 Quick start
- 3.1 One task at a call
- 3.2 Tell SURF about your data
4 The complete SURF workflow
5 Visualization
- 5.1 Visualize DrSeq results
- 5.2 Visualize FASeq results
6 Appendix
- 6.1 Overlap operations
7 Version information

1 Overview

This document provides an example of using SURF (Version 1.0.0) to integrate CLIP-seq and RNA-seq data for predicting RNA-binding protein (RBP) functions in alternative transcriptional regulation (ATR). SURF is versatile in and can handle four different tasks centering ATR events:

	Data	Format	Task
1	genome annotation	any (gtf, gff, …)	parse ATR events
2	+ RNA-seq	alignment (bam)	detect differential ATR events
3	+ CLIP-seq	alignment (bam)	detect functional association
4	+ external RNA-seq	summarized table	differential transcriptional activity

To get supports, please submit issues to our GitHub page.

Schematic overview of SURF pipeline. (a) In the analysis module 1, the first step parses alternative transcriptional regulation (ATR) events from genome annotation. (b) SURF quantifies relative event/exon usage (REU) using RNA-seq data following RBP knockdown (with wild-type control) and performs differential REU analysis (DrSeq). As a result, it infers regulation effects of RBPs as phenotypical labels for each differential ATR event.(c) In the analysis module 2, SURF extracts location features for each ATR event and generates the feature signals using the complementary eCLIP-seq data. (d) Then, SURF models differential ATR labels by associating them with the feature signals and infers global positional preferences of RBPs. (e) In the discovery module, the inferred location features play a key role in downstream RBP-related discovery. The rank-based analysis of RBP transcript targets using external transcriptome datasets (e.g., TCGA and GTEx) discovers differential transcriptional activity through specific RBP and ATR event types.

Figure 1: Schematic overview of SURF pipeline
(a) In the analysis module 1, the first step parses alternative transcriptional regulation (ATR) events from genome annotation. (b) SURF quantifies relative event/exon usage (REU) using RNA-seq data following RBP knockdown (with wild-type control) and performs differential REU analysis (DrSeq). As a result, it infers regulation effects of RBPs as phenotypical labels for each differential ATR event.(c) In the analysis module 2, SURF extracts location features for each ATR event and generates the feature signals using the complementary eCLIP-seq data. (d) Then, SURF models differential ATR labels by associating them with the feature signals and infers global positional preferences of RBPs. (e) In the discovery module, the inferred location features play a key role in downstream RBP-related discovery. The rank-based analysis of RBP transcript targets using external transcriptome datasets (e.g., TCGA and GTEx) discovers differential transcriptional activity through specific RBP and ATR event types.

2 Preparations

To reproduce the results in this vignette, download the demo data from Zenodo and put them into a “data” folder under the working directory.

3 Quick start

3.1 One task at a call

The four tasks in the SURF pipeline should be streamlined. Once you have the data in hand (see the next subsection for how to tell SURF about your data), each step can be performed with a single function call:

library(surf)

event <- parseEvent(anno_file)                                  # step 1
event <- drseq(event, rna_seq_sample)                           # step 2
event <- faseq(event, clip_seq_sample)                          # step 3
event <- daseq(event, getRankings(exprMat), ext_sample)         # step 4

Here, anno_file (a charactor(1)), rna_seq_sample (a data.frame), clip_seq_sample (a data.frame), and ext_sample (a data.frame) describe your data files, and exprMat (a data.frame) is a summary table of some external transcriptome quantification (e.g., TCGA, GTEx, etc).

3.2 Tell SURF about your data

Describing your data is straightforward.

For task 1, a character(1) containing the file path will do.

anno_file <- "data/gencode.v32.primary.example.gtf"

For task 2, SURF needs to know where the aligned RNA-seq reads (bam files) are and the experimental condition for differential analysis (e.g., “knock-down” or “wild-type”).

rna_seq_sample <- data.frame(
  row.names = c('sample1', 'sample2', 'sample3', 'sample4'),
  bam = paste0("data/",c("KD1", "KD2", "WT1", "WT2"),".bam"),
  condition = c('knockdown', 'knockdown', 'wildtype', 'wildtype'),
  stringsAsFactors = FALSE
)

Similarly for task 3, SURF requires where the alignment files (bam) are and the experimental condition (e.g., “IP” and the size-match input “SMI”).

clip_seq_sample <- data.frame(
  row.names = c('sample5', 'sample6', 'sample7'),
  bam = paste0('data/', c("IP1", "IP2", "SMI"), '.bam'),
  condition = c('IP', 'IP', 'SMI'),
  stringsAsFactors = F
)

Finally, for task 4, SURF assumes that you have transcriptome quantification summarized in a table (exprMat). Each row of the table corresponds to a genomic feature (e.g., genes, transcripts, etc) and each column corresponds to a sample. You can use any your favorite measure (e.g. TPM, RPKM, etc). In addition, it requires a table describing the sample group (condition).

exprMat <- readRDS('data/TcgaTargetGtex_rsem_isoform_tpm_laml_blood_10each.rds')
ext_sample <- data.frame(
  condition = rep(c('TCGA', 'GTEx'), each = 10),
  row.names = colnames(exprMat)
)

4 The complete SURF workflow

4.1 Step 1: Parse ATR events from genome annotation.

The first step is to parse genome annotation for alternative transcriptional regulation (ATR) events. An ATR event is the exonic region that excluded consecutively in a specific transcript (and is included in some other transcript isoforms from the same gene). There are eight different ATR event types (Figure 2). To parse ATR events, SURF provides the parseEvent() function. The function compares each transcript to the gene model and extracts the absent exonic regions. These regions are mutually disjoint within the transcript thus can be annotated into specific ATR event types, according to their relative positions within the transcript.

Illustration of eight ATR event types. The eight event types include exon skipping (SE), alternative 3' (A3SS) and 5' (A5SS) splicing, and intron retention (RI) within the AS class; alternative first exon (AFE) and alternative 5'UTR (A5U) within the ATI class; intronic (IAP) and tandem (TAP) alternative polyadenylation within the APA class. In each panel, the upper track depicts part of a gene model, and the lower track demarcates a specific ATR event in a transcript (isoform) with a white dashed box.

Figure 2: Illustration of eight ATR event types
The eight event types include exon skipping (SE), alternative 3’ (A3SS) and 5’ (A5SS) splicing, and intron retention (RI) within the AS class; alternative first exon (AFE) and alternative 5’UTR (A5U) within the ATI class; intronic (IAP) and tandem (TAP) alternative polyadenylation within the APA class. In each panel, the upper track depicts part of a gene model, and the lower track demarcates a specific ATR event in a transcript (isoform) with a white dashed box.

The simplest use of parseEvent() is to provide the file path of genome annotation. Various types of genome annotation formats are acceptable, such as GTF or GFF.

event <- parseEvent("data/gencode.v32.primary.example.gtf")

The function allows several options. For example,

cores specifies the number of computing processors. Since the parsing step is done for individual genes separately, we recommend parallel computing whenever possible. By default, the function will detect the available cores in the current environment (parallel::detectCores()), say \(n\), and uses \(\max (n-2, 1)\) cores. The entire SURF workflow uses the same default whenever parallel computing is applicable.
min.event.length allows you to filter the widths (in bp) of ATR events with a lower limit. The default is 6.
location.feature determines whether location features should also be extracted. Location features are the potential locations of protein-RNA interactions within/nearby the specific ATR event. Specifically, they are defined within and flanking the event body and the constitutive (upstream/downstream) exons (Figure 3). The location features is used in Step 3 (to quantify CLIP-seq signals for functional inference) and will not be used by the second step, DrSeq. This may take a considerable amount of computing time. Hence, you could set this option to FALSE if you are only interested in performing differential regulation of ATR events using RNA-seq (DrSeq). The default is TRUE.
depth.exon and depth.intron are relevant only if location.feature=TRUE. These options configure the sizes of location features. Specifically, SURF uses a fixed length for exonic location features111 In some special cases of short ATR events, the exonic location feature will be shorten so that they do not overlap. and a fixed length for all intronic location features respectively (Figure 3). The default is depth.exon=100 and depth.intron=300.
remove.duplicate defines whether identical events should be removed. If remove.duplica=TRUE, the one event with the highest RNA-seq read coverage will be kept, and the other duplicated events will be removed.
verbose controls whether or not the progress should be printed.

Illustration of location features for eight ATR event types. White boxes depict the ATR events with the event type labeled inside. Short and long curly brackets correspond to genomic regions of length 100bp and 300bp respectively.

Figure 3: Illustration of location features for eight ATR event types
White boxes depict the ATR events with the event type labeled inside. Short and long curly brackets correspond to genomic regions of length 100bp and 300bp respectively.

This step outputs a universal surf object, which is an extension of DataFrame (S4 object). Each ATR event is assigned a unique event identifier (event_id) in the form of <transcript_id>@<event_number>. The first component, transcript_id, is because ATR events are specific to individual transcripts and are mutually disjointed within each transcript. <event_number> is numbered from 5’ to 3’ end of the transcript.

event

## surf with 882 rows and 6 columns
##                                    event_id event_name            gene_id
##                                 <character>   <factor>        <character>
## ENST00000163678.11@1   ENST00000163678.11@1        A5U ENSG00000067365.14
## ENST00000163678.11@12 ENST00000163678.11@12        TAP ENSG00000067365.14
## ENST00000163678.11@2   ENST00000163678.11@2        SE  ENSG00000067365.14
## ENST00000163678.11@3   ENST00000163678.11@3        SE  ENSG00000067365.14
## ENST00000381920.7@2     ENST00000381920.7@2        SE  ENSG00000067365.14
## ...                                     ...        ...                ...
## ENST00000576625.5@8     ENST00000576625.5@8       SE   ENSG00000205629.12
## ENST00000576625.5@9     ENST00000576625.5@9       A3SS ENSG00000205629.12
## ENST00000577157.1@1     ENST00000577157.1@1       A5U  ENSG00000205629.12
## ENST00000577157.1@3     ENST00000577157.1@3       SE   ENSG00000205629.12
## ENST00000577157.1@8     ENST00000577157.1@8       IAP  ENSG00000205629.12
##                            transcript_id
##                              <character>
## ENST00000163678.11@1  ENST00000163678.11
## ENST00000163678.11@12 ENST00000163678.11
## ENST00000163678.11@2  ENST00000163678.11
## ENST00000163678.11@3  ENST00000163678.11
## ENST00000381920.7@2    ENST00000381920.7
## ...                                  ...
## ENST00000576625.5@8    ENST00000576625.5
## ENST00000576625.5@9    ENST00000576625.5
## ENST00000577157.1@1    ENST00000577157.1
## ENST00000577157.1@3    ENST00000577157.1
## ENST00000577157.1@8    ENST00000577157.1
##                                                                                             genomicData
##                                                                                           <GRangesList>
## ENST00000163678.11@1                                                            chr16:8621683-8621690:+
## ENST00000163678.11@12                                                           chr16:8646225-8649654:+
## ENST00000163678.11@2                                                            chr16:8623602-8624118:+
## ENST00000163678.11@3                                    chr16:8631247-8632138:+,chr16:8635039-8635079:+
## ENST00000381920.7@2                                                             chr16:8623602-8624118:+
## ...                                                                                                 ...
## ENST00000576625.5@8                                                           chr16:25170714-25170805:+
## ENST00000576625.5@9                                                           chr16:25174746-25174936:+
## ENST00000577157.1@1                                                           chr16:25111731-25111744:+
## ENST00000577157.1@3                                                           chr16:25125964-25126126:+
## ENST00000577157.1@8   chr16:25161102-25161206:+,chr16:25164598-25164718:+,chr16:25169112-25169545:+,...
##                                                                                                 feature
##                                                                                           <GRangesList>
## ENST00000163678.11@1        chr16:8621383-8621682:+,chr16:8621683-8621690:+,chr16:8621683-8621690:+,...
## ENST00000163678.11@12       chr16:8646125-8646224:+,chr16:8646225-8646324:+,chr16:8649555-8649654:+,...
## ENST00000163678.11@2        chr16:8621691-8621775:+,chr16:8621776-8622075:+,chr16:8623302-8623601:+,...
## ENST00000163678.11@3        chr16:8629011-8629110:+,chr16:8629111-8629410:+,chr16:8630947-8631246:+,...
## ENST00000381920.7@2         chr16:8621688-8621775:+,chr16:8621776-8622075:+,chr16:8623302-8623601:+,...
## ...                                                                                                 ...
## ENST00000576625.5@8   chr16:25169114-25169213:+,chr16:25169214-25169513:+,chr16:25170414-25170713:+,...
## ENST00000576625.5@9   chr16:25169114-25169213:+,chr16:25169214-25169513:+,chr16:25174446-25174745:+,...
## ENST00000577157.1@1   chr16:25111431-25111730:+,chr16:25111731-25111744:+,chr16:25111731-25111744:+,...
## ENST00000577157.1@3   chr16:25111745-25111768:+,chr16:25111769-25112068:+,chr16:25125664-25125963:+,...
## ENST00000577157.1@8   chr16:25158832-25158931:+,chr16:25158932-25159231:+,chr16:25160802-25161101:+,...

Using the mcols() function, we can inspect the description of each column. In particular, the 4th column contains the genomic data of the ATR event (GenomicRangesList). The reason that SURF uses GenomicRangesList instead of GenomicRanges is because some ATR events, namely AFE, IAP, and SE if consecutive, may consist of multiple exons. If location.feature=TRUE, then parseEvent() also outputs the 5th column (also a GenomicRangesList) containing the location features.

mcols(event)

## DataFrame with 6 rows and 2 columns
##                      type            description
##               <character>            <character>
## event_id       annotation       event identifier
## event_name     annotation       event type/class
## gene_id        annotation  gene/group identifier
## transcript_id  annotation harbouring transcrip..
## genomicData    annotation genomic ranges of th..
## feature        annotation genomic ranges of th..

The output surf object also comes with a genePartsList slot, for gene parts list. genePartsList is a DataFrame of 5 columns. Use genePartsList() to retrieve the object and mcols() to inspect the descriptions of its columns. In particular, the 3rd column (segment) contains the smallest pieces (exonic or intronic) in the gene. The 4th column (label) indicates whether each segment is exonic or intronic. The 5th column (layout) indicates whether each exonic segment is included by individual transcripts of the gene. SURF defines the absent exonic regions for each transcript based on these information.

pl <- genePartsList(event)
pl

## DataFrame with 22 rows and 5 columns
##                               gene_id
##                           <character>
## ENSG00000067365.14 ENSG00000067365.14
## ENSG00000075399.14 ENSG00000075399.14
## ENSG00000086504.17 ENSG00000086504.17
## ENSG00000102897.10 ENSG00000102897.10
## ENSG00000102984.15 ENSG00000102984.15
## ...                               ...
## ENSG00000171241.9   ENSG00000171241.9
## ENSG00000174990.7   ENSG00000174990.7
## ENSG00000183751.15 ENSG00000183751.15
## ENSG00000196408.11 ENSG00000196408.11
## ENSG00000205629.12 ENSG00000205629.12
##                                                                 transcript_id
##                                                               <CharacterList>
## ENSG00000067365.14 ENST00000563958.5,ENST00000381920.7,ENST00000163678.11,...
## ENSG00000075399.14  ENST00000561976.5,ENST00000389386.8,ENST00000565023.1,...
## ENSG00000086504.17 ENST00000199706.13,ENST00000648346.1,ENST00000483764.5,...
## ENSG00000102897.10  ENST00000564457.1,ENST00000568663.5,ENST00000412082.6,...
## ENSG00000102984.15 ENST00000568961.5,ENST00000313565.10,ENST00000611294.4,...
## ...                                                                       ...
## ENSG00000171241.9   ENST00000303383.8,ENST00000567698.1,ENST00000563219.1,...
## ENSG00000174990.7   ENST00000649158.1,ENST00000648177.1,ENST00000649794.2,...
## ENSG00000183751.15  ENST00000568546.6,ENST00000561907.5,ENST00000569628.5,...
## ENSG00000196408.11  ENST00000569739.1,ENST00000356120.8,ENST00000354249.8,...
## ENSG00000205629.12  ENST00000380962.9,ENST00000380966.8,ENST00000564011.5,...
##                                                                                              segment
##                                                                                        <GRangesList>
## ENSG00000067365.14       chr16:8621383-8621682:+,chr16:8621683-8621687:+,chr16:8621688-8621690:+,...
## ENSG00000075399.14 chr16:89720899-89721198:-,chr16:89720886-89720898:-,chr16:89720763-89720885:-,...
## ENSG00000086504.17             chr16:371290-371589:-,chr16:371067-371289:-,chr16:370539-371066:-,...
## ENSG00000102897.10 chr16:20899568-20899867:+,chr16:20899868-20899901:+,chr16:20899902-20900004:+,...
## ENSG00000102984.15 chr16:71895337-71895636:-,chr16:71894889-71895336:-,chr16:71884810-71894888:-,...
## ...                                                                                              ...
## ENSG00000171241.9  chr16:46621380-46621679:-,chr16:46621369-46621379:-,chr16:46621352-46621368:-,...
## ENSG00000174990.7  chr16:87936581-87936880:-,chr16:87936561-87936580:-,chr16:87936530-87936560:-,...
## ENSG00000183751.15       chr16:1971753-1972052:+,chr16:1972053-1972070:+,chr16:1972071-1972085:+,...
## ENSG00000196408.11       chr16:1984193-1984492:-,chr16:1984167-1984192:-,chr16:1982752-1984166:-,...
## ENSG00000205629.12 chr16:25111431-25111730:+,chr16:25111731-25111734:+,chr16:25111735-25111741:+,...
##                            label
##                    <NumericList>
## ENSG00000067365.14     0,1,1,...
## ENSG00000075399.14     0,1,1,...
## ENSG00000086504.17     0,1,0,...
## ENSG00000102897.10     0,1,1,...
## ENSG00000102984.15     0,1,0,...
## ...                          ...
## ENSG00000171241.9      0,1,1,...
## ENSG00000174990.7      0,1,1,...
## ENSG00000183751.15     0,1,1,...
## ENSG00000196408.11     0,1,0,...
## ENSG00000205629.12     0,1,1,...
##                                                                                   layout
##                                                                                   <List>
## ENSG00000067365.14    FALSE:FALSE:FALSE:...,TRUE:FALSE:FALSE:...,TRUE:TRUE:FALSE:...,...
## ENSG00000075399.14   FALSE:FALSE:FALSE:...,FALSE:TRUE:FALSE:...,FALSE:TRUE:FALSE:...,...
## ENSG00000086504.17  FALSE:FALSE:FALSE:...,FALSE:TRUE:FALSE:...,FALSE:FALSE:FALSE:...,...
## ENSG00000102897.10    FALSE:FALSE:FALSE:...,TRUE:FALSE:FALSE:...,TRUE:TRUE:FALSE:...,...
## ENSG00000102984.15 FALSE:FALSE:FALSE:...,FALSE:FALSE:FALSE:...,FALSE:FALSE:FALSE:...,...
## ...                                                                                  ...
## ENSG00000171241.9    FALSE:FALSE:FALSE:...,TRUE:FALSE:FALSE:...,TRUE:FALSE:FALSE:...,...
## ENSG00000174990.7      FALSE:FALSE:FALSE:...,TRUE:TRUE:FALSE:...,TRUE:TRUE:FALSE:...,...
## ENSG00000183751.15    FALSE:FALSE:FALSE:...,TRUE:FALSE:FALSE:...,TRUE:TRUE:FALSE:...,...
## ENSG00000196408.11 FALSE:FALSE:FALSE:...,FALSE:FALSE:FALSE:...,FALSE:FALSE:FALSE:...,...
## ENSG00000205629.12     FALSE:FALSE:FALSE:...,TRUE:FALSE:FALSE:...,TRUE:TRUE:TRUE:...,...

mcols(pl)

## DataFrame with 5 rows and 1 column
##                          description
##                          <character>
## gene_id              gene identifier
## transcript_id transcript identifiers
## segment       genomic segments (pa..
## label         exonic label (0 as i..
## layout        transcript (isoform)..

Several overlapping operations have been implemented for surf object (see Section 6.1 for details).

4.2 Step 2: Detect differential regulation (DR) of ATR events

To detect the differential regulation of ATR events, use drseq(). The function requires a table input that specifies the sample information of RNA-seq. The sample data should at least contain a condition column indicating the experimental conditions and a bam column giving the file directory for the aligned RNA-seq reads. Currently, SURF supports “bam” format. If you have read alignments in the “sam” format, we recommend using samtools (the “view” command) for conversion.

rna_seq_sample <- data.frame(
  row.names = c('sample1', 'sample2', 'sample3', 'sample4'),
  bam = paste0("data/",c("KD1", "KD2", "WT1", "WT2"),".bam"),
  condition = c('knockdown', 'knockdown', 'wildtype', 'wildtype'),
  stringsAsFactors = FALSE
) 
event <- drseq(event, rna_seq_sample)

The output of drseq() (still a surf object) contains 6 additional columns (the 7-12th) to the existing output from Step 1. Use mcols() to check column descriptions.

mcols(event)[7:12,]

## DataFrame with 6 rows and 2 columns
##                      type            description
##               <character>            <character>
## eventBaseMean       drseq mean of the counts a..
## padj                drseq   BH adjusted p-values
## logFoldChange       drseq relative exon usage ..
## adjMean           RNA-seq adjusted base mean o..
## group               faseq direction of differe..
## included            faseq indicator of usable ..

event[,7:12]

## surf with 882 rows and 6 columns
##                       eventBaseMean      padj logFoldChange    adjMean
##                           <numeric> <numeric>     <numeric>  <numeric>
## ENST00000163678.11@1        0.00000        NA            NA 0.00000000
## ENST00000163678.11@12     312.11389  0.054970     0.0916593 0.08844259
## ENST00000163678.11@2        5.50491  0.975509     0.4898729 0.00893654
## ENST00000163678.11@3       55.72381  0.994272     0.1336382 0.05399594
## ENST00000381920.7@2         5.50491  0.975509     0.4898730 0.00893654
## ...                             ...       ...           ...        ...
## ENST00000576625.5@8        97.06122  0.970435    -0.1207714  0.5081739
## ENST00000576625.5@9         4.78001  0.833863    -1.0128452  0.0164828
## ENST00000577157.1@1         5.61396  0.984705    -0.2227436  0.0496810
## ENST00000577157.1@3         7.47223  0.975509    -0.3818396  0.0285200
## ENST00000577157.1@8       328.63720  0.901839    -0.0356011  0.2397062
##                           group  included
##                       <ordered> <logical>
## ENST00000163678.11@1  no change     FALSE
## ENST00000163678.11@12 no change     FALSE
## ENST00000163678.11@2  no change     FALSE
## ENST00000163678.11@3  no change      TRUE
## ENST00000381920.7@2   no change     FALSE
## ...                         ...       ...
## ENST00000576625.5@8   no change      TRUE
## ENST00000576625.5@9   no change     FALSE
## ENST00000577157.1@1   no change     FALSE
## ENST00000577157.1@3   no change     FALSE
## ENST00000577157.1@8   no change     FALSE

In addition to the main DataFrame, the output also contains a drseqResults slot, which can be simply accessed by the drseqResults() function. The descriptions of columns can be fetched with the mcols() function. In the DrSeq results, the two most useful columns are padj and logFoldChange. padj is the p-value of differential event/exon usage (DEU), after multiplicity adjustment with the BH procedure. logFoldChange is the fold change of “knockdown” versus “wildtype”, after the logarithmic transformation with base 2. For example, if logFoldChange>0, it means the EU under the knock-down condition is higher than that under the wild-type condition. For the visualizations of DrSeq results, see Section 5.1.

drr <- drseqResults(event)
mcols(drr)

## DataFrame with 15 rows and 2 columns
##                      type            description
##               <character>            <character>
## event_id       annotation       event identifier
## event_name     annotation       event type/class
## gene_id        annotation  gene/group identifier
## transcript_id  annotation harbouring transcrip..
## genomicData    annotation genomic ranges of th..
## ...                   ...                    ...
## padj                drseq   BH adjusted p-values
## knockdown           drseq exon usage coefficient
## wildtype            drseq exon usage coefficient
## logFoldChange       drseq relative exon usage ..
## countData         RNA-seq matrix of integer co..

drr

## drseqResults with 882 rows and 15 columns
##                                    event_id event_name            gene_id
##                                 <character>   <factor>        <character>
## ENST00000163678.11@1   ENST00000163678.11@1        A5U ENSG00000067365.14
## ENST00000163678.11@12 ENST00000163678.11@12        TAP ENSG00000067365.14
## ENST00000163678.11@2   ENST00000163678.11@2        SE  ENSG00000067365.14
## ENST00000163678.11@3   ENST00000163678.11@3        SE  ENSG00000067365.14
## ENST00000381920.7@2     ENST00000381920.7@2        SE  ENSG00000067365.14
## ...                                     ...        ...                ...
## ENST00000576625.5@8     ENST00000576625.5@8       SE   ENSG00000205629.12
## ENST00000576625.5@9     ENST00000576625.5@9       A3SS ENSG00000205629.12
## ENST00000577157.1@1     ENST00000577157.1@1       A5U  ENSG00000205629.12
## ENST00000577157.1@3     ENST00000577157.1@3       SE   ENSG00000205629.12
## ENST00000577157.1@8     ENST00000577157.1@8       IAP  ENSG00000205629.12
##                            transcript_id
##                              <character>
## ENST00000163678.11@1  ENST00000163678.11
## ENST00000163678.11@12 ENST00000163678.11
## ENST00000163678.11@2  ENST00000163678.11
## ENST00000163678.11@3  ENST00000163678.11
## ENST00000381920.7@2    ENST00000381920.7
## ...                                  ...
## ENST00000576625.5@8    ENST00000576625.5
## ENST00000576625.5@9    ENST00000576625.5
## ENST00000577157.1@1    ENST00000577157.1
## ENST00000577157.1@3    ENST00000577157.1
## ENST00000577157.1@8    ENST00000577157.1
##                                                                                             genomicData
##                                                                                           <GRangesList>
## ENST00000163678.11@1                                                            chr16:8621683-8621690:+
## ENST00000163678.11@12                                                           chr16:8646225-8649654:+
## ENST00000163678.11@2                                                            chr16:8623602-8624118:+
## ENST00000163678.11@3                                    chr16:8631247-8632138:+,chr16:8635039-8635079:+
## ENST00000381920.7@2                                                             chr16:8623602-8624118:+
## ...                                                                                                 ...
## ENST00000576625.5@8                                                           chr16:25170714-25170805:+
## ENST00000576625.5@9                                                           chr16:25174746-25174936:+
## ENST00000577157.1@1                                                           chr16:25111731-25111744:+
## ENST00000577157.1@3                                                           chr16:25125964-25126126:+
## ENST00000577157.1@8   chr16:25161102-25161206:+,chr16:25164598-25164718:+,chr16:25169112-25169545:+,...
##                                                                                                 feature
##                                                                                           <GRangesList>
## ENST00000163678.11@1        chr16:8621383-8621682:+,chr16:8621683-8621690:+,chr16:8621683-8621690:+,...
## ENST00000163678.11@12       chr16:8646125-8646224:+,chr16:8646225-8646324:+,chr16:8649555-8649654:+,...
## ENST00000163678.11@2        chr16:8621691-8621775:+,chr16:8621776-8622075:+,chr16:8623302-8623601:+,...
## ENST00000163678.11@3        chr16:8629011-8629110:+,chr16:8629111-8629410:+,chr16:8630947-8631246:+,...
## ENST00000381920.7@2         chr16:8621688-8621775:+,chr16:8621776-8622075:+,chr16:8623302-8623601:+,...
## ...                                                                                                 ...
## ENST00000576625.5@8   chr16:25169114-25169213:+,chr16:25169214-25169513:+,chr16:25170414-25170713:+,...
## ENST00000576625.5@9   chr16:25169114-25169213:+,chr16:25169214-25169513:+,chr16:25174446-25174745:+,...
## ENST00000577157.1@1   chr16:25111431-25111730:+,chr16:25111731-25111744:+,chr16:25111731-25111744:+,...
## ENST00000577157.1@3   chr16:25111745-25111768:+,chr16:25111769-25112068:+,chr16:25125664-25125963:+,...
## ENST00000577157.1@8   chr16:25158832-25158931:+,chr16:25158932-25159231:+,chr16:25160802-25161101:+,...
##                       eventBaseMean dispersion      stat     pvalue      padj
##                           <numeric>  <numeric> <numeric>  <numeric> <numeric>
## ENST00000163678.11@1        0.00000         NA        NA         NA        NA
## ENST00000163678.11@12     312.11389 0.00100486 9.1872298 0.00243709  0.054970
## ENST00000163678.11@2        5.50491 0.02949847 0.2557281 0.61307033  0.975509
## ENST00000163678.11@3       55.72381 0.01953954 0.0112228 0.91563191  0.994272
## ENST00000381920.7@2         5.50491 0.02949847 0.2557281 0.61307033  0.975509
## ...                             ...        ...       ...        ...       ...
## ENST00000576625.5@8        97.06122  0.0127761  0.375216   0.540175  0.970435
## ENST00000576625.5@9         4.78001  0.2634898  1.226814   0.268027  0.833863
## ENST00000577157.1@1         5.61396  0.0331676  0.148010   0.700445  0.984705
## ENST00000577157.1@3         7.47223  0.0788642  0.321010   0.571001  0.975509
## ENST00000577157.1@8       328.63720  0.0116675  0.623002   0.429934  0.901839
##                       knockdown  wildtype logFoldChange       countData
##                       <numeric> <numeric>     <numeric>        <matrix>
## ENST00000163678.11@1         NA        NA            NA       0:0:0:...
## ENST00000163678.11@12  23.57588  24.29237     0.0916593 314:406:242:...
## ENST00000163678.11@2    3.12186   3.69866     0.4898729       6:6:4:...
## ENST00000163678.11@3   10.56838  11.06236     0.1336382    58:82:30:...
## ENST00000381920.7@2     3.12186   3.69866     0.4898730       6:6:4:...
## ...                         ...       ...           ...             ...
## ENST00000576625.5@8    14.48334  13.90318    -0.1207714  124:138:84:...
## ENST00000576625.5@9     3.85856   2.72218    -1.0128452      10:6:4:...
## ENST00000577157.1@1     3.90010   3.61244    -0.2227436      6:10:4:...
## ENST00000577157.1@3     4.21989   3.69773    -0.3818396      8:14:2:...
## ENST00000577157.1@8    18.14958  17.95783    -0.0356011 378:490:260:...

4.3 Step 3: Test the functional association between RBP binding and DR events.

To detect the differential regulation of ATR events, use faseq(). The function requires a sampleData table that specifies the sample information of CLIP-seq data. In particular, a condition column indicates the experimental conditions is required, as well as a bam column giving the file directory of CLIP-seq alignments. Several parameters are available:

min.size specifies the minimal sample size in order for a functional association test. The default is 100. We recommend to set this parameter greater than 50 for sufficient statistical power.
fdr.cutoff is the cutoff of adjusted p-values (FDR). The default is 0.05. This parameter is used for inferring functional location features.
signal.cutoff is the cut-off of CLIP-seq signals. The CLIP-seq signal is normalized CLIP-seq read coverage on location features. The default is 20. This parameter is used for inferring location features.

clip_seq_sample = data.frame(
  row.names = c('sample5', 'sample6', 'sample7'),
  bam = paste0("data/",c("IP1", "IP2", "SMI"),".bam"),
  condition = c('IP', 'IP', 'SMI'),
  stringsAsFactors = FALSE
)
event <- faseq(event, clip_seq_sample, 
               min.size = 3, fdr.cutoff = 0.3, signal.cutoff = 2)

The output of faseq() (still a surf object) adds two additional columns (the 13th and 14th) to the existing output from Step 2.

event[,13:14]

## surf with 882 rows and 2 columns
##                                        featureSignal    inferredFeature
##                                        <NumericList>       <FactorList>
## ENST00000163678.11@1                       0,0,0,... none,none,none,...
## ENST00000163678.11@12  157.556,   0.000,-105.038,... none,none,none,...
## ENST00000163678.11@2   48.1986,-41.0542,-40.1397,... none,none,none,...
## ENST00000163678.11@3   76.5063, 91.8293,-55.5396,... none,none,none,...
## ENST00000381920.7@2    49.3822,-19.6126,-40.4355,... none,none,none,...
## ...                                              ...                ...
## ENST00000576625.5@8    4.71900, 0.00000,-9.27298,... none,none,none,...
## ENST00000576625.5@9      42.5688, 0.0000,17.5063,... none,none,none,...
## ENST00000577157.1@1      13.6563, 0.0000, 0.0000,... none,none,none,...
## ENST00000577157.1@3     0.0000,-22.6334,  0.0000,... none,none,none,...
## ENST00000577157.1@8      40.9688,28.1417,14.4854,... none,none,none,...

Again, use mcols() to inspect the description of each column.

mcols(event)[13:14,]

## DataFrame with 2 rows and 2 columns
##                        type            description
##                 <character>            <character>
## featureSignal      CLIP-seq normalized CLIP-seq ..
## inferredFeature       faseq inferred functionali..

In addition to the main output, the output also contains a faseqResults slot (a DataFrame object) which can be accessed by the faseqResults() function. For column descriptions, use the mcols() function.

far <- faseqResults(event)
mcols(far)

## DataFrame with 9 rows and 2 columns
##                   type            description
##            <character>            <character>
## event            faseq    event type/category
## feature          faseq     positional feature
## size             faseq number of events (sa..
## Estimate         faseq estimated feature ma..
## Std..Error       faseq estiamted feature st..
## z.value          faseq standardized Z value..
## p.value          faseq                p value
## functional       faseq inferred regulating ..
## padj             faseq  adjusted p value (BH)

far

## faseqResults with 4 rows and 9 columns
##                      event     feature      size     Estimate Std..Error
##                   <factor> <character> <integer>    <numeric>  <numeric>
## TAP-up1:exclusion      TAP         up1         3  0.002335845 0.00225635
## TAP-bd1:exclusion      TAP         bd1         3  0.003217598 0.00215921
## TAP-bd2:exclusion      TAP         bd2         3 -0.000156098 0.00203401
## TAP-dn1:exclusion      TAP         dn1         3 -0.020461824 0.01423604
##                     z.value   p.value  functional      padj
##                   <numeric> <numeric> <character> <numeric>
## TAP-up1:exclusion  1.035233 0.1502800   exclusion  0.300560
## TAP-bd1:exclusion  1.490176 0.0680889   exclusion  0.272356
## TAP-bd2:exclusion -0.076744 0.5305864   exclusion  0.707449
## TAP-dn1:exclusion -1.437325 0.9246872   exclusion  0.924687

Each (ATR event, location feature) combination is tested twice for its inclusion and exclusion functions if the usable sample size is greater than min.size. Each row of output corresponds to the result of one test, including test statistic, p-value, etc. In particular, the 2nd column (feature) indicates one of the eight possible the location features. The names of features each maps to one of the eight Greek letters in Figure 3 as follows.

feature name	up3	up2	up1	bd1	bd2	dn1	dn2	dn3
location feature	\(\alpha\)	\(\beta\)	\(\gamma\)	\(\delta\)	\(\varepsilon\)	\(\zeta\)	\(\eta\)	\(\theta\)

A useful piece of FASeq results is the inferred location features. The SURF-inferred location features are those genomic regions significantly associated with either inclusion or exclusion of the ATR events and abundantly bound by the RBP (in wild-type condition). inferredFeature() retrieves these genomic regions and outputs them in a GenomicRanges object. The function essentially filters by adjusted p-values (FDR) and CLIP-seq read coverage. These genomic regions are ATR event-specific (thus transcript- and gene-specific), each of which is assigned a unique name in the <event_id>-<feature_name> form.

inferredFeature(event)

## GRanges object with 3 ranges and 6 metadata columns:
##                            seqnames            ranges strand |
##                               <Rle>         <IRanges>  <Rle> |
##   ENST00000561758.5@13-bd1    chr16   8646192-8646291      + |
##    ENST00000564967.1@2-bd1    chr16 29697996-29698095      + |
##    ENST00000567698.1@2-bd1    chr16 46581813-46581912      - |
##                                        event_id event_name            gene_id
##                                     <character>   <factor>        <character>
##   ENST00000561758.5@13-bd1 ENST00000561758.5@13        TAP ENSG00000067365.14
##    ENST00000564967.1@2-bd1  ENST00000564967.1@2        TAP ENSG00000103485.19
##    ENST00000567698.1@2-bd1  ENST00000567698.1@2        TAP  ENSG00000171241.9
##                                transcript_id feature_name functional
##                                  <character>     <factor>   <factor>
##   ENST00000561758.5@13-bd1 ENST00000561758.5          bd1  exclusion
##    ENST00000564967.1@2-bd1 ENST00000564967.1          bd1  exclusion
##    ENST00000567698.1@2-bd1 ENST00000567698.1          bd1  exclusion
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

For example, the first SURF-inferred location feature in the output is for the \(\delta\) location (feature_name column) of a TAP event (event_name column). The binding of the RBP here is associated with the exclusion (functional column) of the ATR event body (in wild-type condition).

For visualizations of FASeq results, see Section 5.2.

4.4 Step 4: RBP-related discovery using external transcriptome data

daseq() performs RBP-related discovery. Specifically, it uses a re-sampling based method to detect differential activity of a set of RBP targets between two experimental conditions. Currently, daseq() supports two types of RBP targets—either genes or transcripts. From Step 3 (FASeq), daseq() extracts the RBP target sets automatically (although the function also allows users to specify their own RBP target sets of interest). Besides the output from faseq(), daseq() requires two additional inputs: (1) a summary table of external transcriptome quantification. Each row represents a genomic feature (gene or transcript), and each column corresponds to one sample. (2) A data.table that describes sample information. In particular, the table must contain a column called condition, indicating the experimental conditions. Other controlling parameters include:

target.type determines the type of RBP targets. Currently, it could be either “transcript” or “gene.” The default is “transcript.”
targetSets and controlSets are default to NULL. These two arguments allow users to specify their own sets of RBP targets of interest and the corresponding control set.
n.sample is the number of times for resampling. The default is 1,000.

exprMat <- readRDS('data/TcgaTargetGtex_rsem_isoform_tpm_laml_blood_10each.rds')
ext_sample <- data.frame(
  condition = rep(c('TCGA', 'GTEx'), each = 10),
  row.names = colnames(exprMat)
)
event <- daseq(event, getRankings(exprMat), ext_sample, 
               cores = 1, target.type = "transcript")

The output of daseq() is still a surf object. It contains a daseqResults slot, which can be accessed by the daseqResults() function. For column descriptions, use mcols().

dar <- daseqResults(event)
mcols(dar)

## DataFrame with 11 rows and 2 columns
##                   type            description
##            <character>            <character>
## id               daseq  target set identifier
## size             daseq        target set size
## set              daseq             target set
## control          daseq       full control set
## base             daseq base AUC of target set
## GTEx             daseq target set activity ..
## TCGA             daseq target set activity ..
## background       daseq background differenc..
## stat             daseq statistic (parametri..
## p.value          daseq p-value of different..
## padj             daseq      adjusted p-values

dar

## daseqResults with 1 row and 11 columns
##              id      size                                                   set
##     <character> <integer>                                       <CharacterList>
## TAP         TAP         3 ENST00000561758.5,ENST00000564967.1,ENST00000567698.1
##                                                        control      base
##                                                <CharacterList> <numeric>
## TAP ENST00000563958.5,ENST00000381920.7,ENST00000163678.11,...         0
##          GTEx      TCGA background      stat   p.value      padj
##     <numeric> <numeric>  <numeric> <numeric> <numeric> <numeric>
## TAP         0         0          0       NaN         0         0

Each row of output table summarizes the DA testing of one RBP target set. In particular, the 9th column (stat) is the test statistic, and the 11th column (padj) is the p-value after the multiplicity adjustment with the BH procedure.

5 Visualization

5.1 Visualize DrSeq results

For both surf and drseqResults objects, two visualization methods are available: (1) plotDispFunc() which plots the fitted dispersion functions for each ATR event type (Figure 4), and (2) volcano.plot() which plots the volcano plot for each ATR events stratified by different event types (Figure 5).

Estimated mean-dispersion functions from DrSeq analysis of RNA-seq datasets (ENCODE) with RBP targets CPSF6. Each line corresponds to an ATR event type from Figure 2.

Figure 4: Estimated mean-dispersion functions from DrSeq analysis of RNA-seq datasets (ENCODE) with RBP targets CPSF6
Each line corresponds to an ATR event type from Figure 2.

Volcano plot (-log10 transformed adjusted p-value versus log2 of fold change) of DrSeq results for CPSF6, stratified by ATR event types. Horizontal dashed line depicts FDR cut-off level of 0.01 and the vertical lines depict log2 fold change of 1 in absolute value.

Figure 5: Volcano plot (-log10 transformed adjusted p-value versus log2 of fold change) of DrSeq results for CPSF6, stratified by ATR event types
Horizontal dashed line depicts FDR cut-off level of 0.01 and the vertical lines depict log2 fold change of 1 in absolute value.

5.2 Visualize FASeq results

Two visualization methods are available. In particular, fa.plot() generates the functional association (FA) plots. This includes box plots (upper panels) of feature signals and the adjusted p-values resulted from the functional association test (lower panels). For example, we generate the FA plots for four ATR events: AFE, A5U, IAP, and TAP. Since we used a very small subset of genes/transcripts in the example, the statistical power is lower than what they are usually like. Despite this, it can still be seen that the binding of the RBP may result in the exclusion of TAP site by binding at either \(\gamma\) or \(\delta\) location features.

fa.plot(event, plot.event = c("AFE", "A5U", "IAP", "TAP"))

Functional association plot. The upper panels depicts the actual CLIP-seq binding signals on various location features, stratified by the differential event usage (DEU) upon the RBP knock-down (as the results of Step 2 -- DrSeq). The top strips indicate the ATR event type and the number of ATR events in each DEU group are reported in the parenthesis. The lower panels shows the p-values of the functional association test (FAT).

Figure 6: Functional association plot
The upper panels depicts the actual CLIP-seq binding signals on various location features, stratified by the differential event usage (DEU) upon the RBP knock-down (as the results of Step 2 – DrSeq). The top strips indicate the ATR event type and the number of ATR events in each DEU group are reported in the parenthesis. The lower panels shows the p-values of the functional association test (FAT).

6 Appendix

6.1 Overlap operations

The methods subsetByOverlaps() and findOverlaps() have been implemented for the SURF object.

gr0 <- GRanges(seqnames = Rle("chr16"), 
               IRanges(89710000, width = 10000))
findOverlaps(event, subject = gr0)

## Hits object with 19 hits and 0 metadata columns:
##        queryHits subjectHits
##        <integer>   <integer>
##    [1]        74           1
##    [2]        76           1
##    [3]        77           1
##    [4]        79           1
##    [5]        80           1
##    ...       ...         ...
##   [15]        92           1
##   [16]        93           1
##   [17]        94           1
##   [18]        95           1
##   [19]        96           1
##   -------
##   queryLength: 882 / subjectLength: 1

Subsetting by overlaps is also implemented (subsetByOverlaps).

subsetByOverlaps(event, ranges = gr0)

## surf with 19 rows and 14 columns
##                                event_id event_name            gene_id
##                             <character>   <factor>        <character>
## ENST00000389386.8@1 ENST00000389386.8@1       A3SS ENSG00000075399.14
## ENST00000561976.5@2 ENST00000561976.5@2       A3SS ENSG00000075399.14
## ENST00000561976.5@3 ENST00000561976.5@3       SE   ENSG00000075399.14
## ENST00000563798.1@3 ENST00000563798.1@3       A3SS ENSG00000075399.14
## ENST00000563798.1@4 ENST00000563798.1@4       SE   ENSG00000075399.14
## ...                                 ...        ...                ...
## ENST00000567379.5@4 ENST00000567379.5@4       TAP  ENSG00000075399.14
## ENST00000567379.5@5 ENST00000567379.5@5       IAP  ENSG00000075399.14
## ENST00000568691.5@1 ENST00000568691.5@1       AFE  ENSG00000075399.14
## ENST00000568691.5@2 ENST00000568691.5@2       A3SS ENSG00000075399.14
## ENST00000568691.5@4 ENST00000568691.5@4       TAP  ENSG00000075399.14
##                         transcript_id
##                           <character>
## ENST00000389386.8@1 ENST00000389386.8
## ENST00000561976.5@2 ENST00000561976.5
## ENST00000561976.5@3 ENST00000561976.5
## ENST00000563798.1@3 ENST00000563798.1
## ENST00000563798.1@4 ENST00000563798.1
## ...                               ...
## ENST00000567379.5@4 ENST00000567379.5
## ENST00000567379.5@5 ENST00000567379.5
## ENST00000568691.5@1 ENST00000568691.5
## ENST00000568691.5@2 ENST00000568691.5
## ENST00000568691.5@4 ENST00000568691.5
##                                                                                           genomicData
##                                                                                         <GRangesList>
## ENST00000389386.8@1                                                         chr16:89719103-89719369:-
## ENST00000561976.5@2                                                         chr16:89712717-89713080:-
## ENST00000561976.5@3                                                         chr16:89711599-89711644:-
## ENST00000563798.1@3                                                         chr16:89712717-89713080:-
## ENST00000563798.1@4                                                         chr16:89711599-89711644:-
## ...                                                                                               ...
## ENST00000567379.5@4                                                         chr16:89711327-89711358:-
## ENST00000567379.5@5 chr16:89710586-89711010:-,chr16:89709777-89709909:-,chr16:89709227-89709435:-,...
## ENST00000568691.5@1 chr16:89720763-89720898:-,chr16:89719027-89719369:-,chr16:89716730-89716822:-,...
## ENST00000568691.5@2                                                         chr16:89712094-89712099:-
## ENST00000568691.5@4                                                         chr16:89710586-89710932:-
##                                                                                               feature
##                                                                                         <GRangesList>
## ENST00000389386.8@1 chr16:89720763-89720862:-,chr16:89720463-89720762:-,chr16:89719370-89719669:-,...
## ENST00000561976.5@2 chr16:89716462-89716561:-,chr16:89716162-89716461:-,chr16:89713081-89713380:-,...
## ENST00000561976.5@3 chr16:89711882-89711969:-,chr16:89711582-89711881:-,chr16:89711645-89711881:-,...
## ENST00000563798.1@3 chr16:89720763-89720862:-,chr16:89720463-89720762:-,chr16:89713081-89713380:-,...
## ENST00000563798.1@4 chr16:89711882-89711969:-,chr16:89711582-89711881:-,chr16:89711645-89711881:-,...
## ...                                                                                               ...
## ENST00000567379.5@4 chr16:89711359-89711412:-,chr16:89711327-89711358:-,chr16:89711327-89711358:-,...
## ENST00000567379.5@5 chr16:89711359-89711412:-,chr16:89711059-89711358:-,chr16:89711011-89711310:-,...
## ENST00000568691.5@1 chr16:89720899-89721198:-,chr16:89720799-89720898:-,chr16:89716462-89716561:-,...
## ENST00000568691.5@2 chr16:89712460-89712522:-,chr16:89712160-89712459:-,chr16:89712100-89712399:-,...
## ENST00000568691.5@4 chr16:89710933-89711010:-,chr16:89710833-89710932:-,chr16:89710586-89710685:-,...
##                     eventBaseMean        padj logFoldChange    adjMean
##                         <numeric>   <numeric>     <numeric>  <numeric>
## ENST00000389386.8@1       60.6319   0.0965775   6.38962e-01 0.16566094
## ENST00000561976.5@2       48.8882   0.9942717  -2.70125e-01 0.10559013
## ENST00000561976.5@3        1.0855   0.9755091  -2.10888e-07 0.00748622
## ENST00000563798.1@3       48.8882   0.9942717  -2.70125e-01 0.10559013
## ENST00000563798.1@4        1.0855   0.9755091   1.05444e-07 0.00748622
## ...                           ...         ...           ...        ...
## ENST00000567379.5@4       62.6153 6.17281e-02     0.1304812  0.4779795
## ENST00000567379.5@5      175.5244 4.33047e-03     0.0321411  0.0927719
## ENST00000568691.5@1      303.5759 8.60499e-08     0.1173966  0.3639999
## ENST00000568691.5@2       19.2276 9.51831e-01    -0.6657848  0.1831201
## ENST00000568691.5@4       30.6151 8.31006e-01     0.1858942  0.0686438
##                         group  included                     featureSignal
##                     <ordered> <logical>                     <NumericList>
## ENST00000389386.8@1 no change     FALSE   6.57495, 56.85659,189.39602,...
## ENST00000561976.5@2 no change      TRUE    -49.0629, 28.9709,-80.2793,...
## ENST00000561976.5@3 no change     FALSE    -91.2929, 13.6563, 18.3360,...
## ENST00000563798.1@3 no change     FALSE   6.57495, 56.85659,-80.27932,...
## ENST00000563798.1@4 no change     FALSE    -91.2929, 13.6563, 18.3360,...
## ...                       ...       ...                               ...
## ENST00000567379.5@4 no change     FALSE    -85.0812,-64.4138, 43.8671,...
## ENST00000567379.5@5 increase      FALSE -114.0394,  19.1646,  17.5063,...
## ENST00000568691.5@1 increase      FALSE    388.6863, 86.9127,-49.0629,...
## ENST00000568691.5@2 no change      TRUE     105.239,-101.998,-124.623,...
## ENST00000568691.5@4 no change      TRUE    -26.4262,-29.4189,-86.9127,...
##                        inferredFeature
##                           <FactorList>
## ENST00000389386.8@1 none,none,none,...
## ENST00000561976.5@2 none,none,none,...
## ENST00000561976.5@3 none,none,none,...
## ENST00000563798.1@3 none,none,none,...
## ENST00000563798.1@4 none,none,none,...
## ...                                ...
## ENST00000567379.5@4 none,none,none,...
## ENST00000567379.5@5 none,none,none,...
## ENST00000568691.5@1 none,none,none,...
## ENST00000568691.5@2 none,none,none,...
## ENST00000568691.5@4 none,none,none,...

7 Version information

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] BiocStyle_2.18.1            surf_1.0.0                 
##  [3] forcats_0.5.1               stringr_1.4.0              
##  [5] dplyr_1.0.5                 purrr_0.3.4                
##  [7] readr_1.4.0                 tidyr_1.1.3                
##  [9] tibble_3.1.0                ggplot2_3.3.3              
## [11] tidyverse_1.3.0             DEXSeq_1.36.0              
## [13] RColorBrewer_1.1-2          AnnotationDbi_1.52.0       
## [15] DESeq2_1.30.1               SummarizedExperiment_1.20.0
## [17] GenomicRanges_1.42.0        GenomeInfoDb_1.26.4        
## [19] IRanges_2.24.1              S4Vectors_0.28.1           
## [21] MatrixGenerics_1.2.1        matrixStats_0.58.0         
## [23] Biobase_2.50.0              BiocGenerics_0.36.0        
## [25] BiocParallel_1.24.1         doParallel_1.0.16          
## [27] iterators_1.0.13            foreach_1.5.1              
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.1               R.utils_2.10.1           tidyselect_1.1.0        
##   [4] lme4_1.1-26              RSQLite_2.2.4            htmlwidgets_1.5.3       
##   [7] grid_4.0.3               devtools_2.3.2           munsell_0.5.0           
##  [10] codetools_0.2-18         statmod_1.4.35           withr_2.4.1             
##  [13] colorspace_2.0-0         highr_0.8                knitr_1.31              
##  [16] rstudioapi_0.13          ggsignif_0.6.1           Rsubread_2.4.3          
##  [19] labeling_0.4.2           GenomeInfoDbData_1.2.4   hwriter_1.3.2           
##  [22] farver_2.1.0             bit64_4.0.5              rprojroot_2.0.2         
##  [25] coda_0.19-4              vctrs_0.3.6              generics_0.1.0          
##  [28] xfun_0.22                BiocFileCache_1.14.0     R6_2.5.0                
##  [31] arm_1.11-2               locfit_1.5-9.4           AnnotationFilter_1.14.0 
##  [34] bitops_1.0-6             cachem_1.0.4             DelayedArray_0.16.2     
##  [37] assertthat_0.2.1         promises_1.2.0.1         scales_1.1.1            
##  [40] nnet_7.3-15              gtable_0.3.0             processx_3.4.5          
##  [43] ensembldb_2.14.0         rlang_0.4.10             genefilter_1.72.1       
##  [46] splines_4.0.3            lazyeval_0.2.2           rtracklayer_1.50.0      
##  [49] rstatix_0.7.0            broom_0.7.5              checkmate_2.0.0         
##  [52] BiocManager_1.30.10      yaml_2.2.1               modelr_0.1.8            
##  [55] reshape2_1.4.4           abind_1.4-5              GenomicFeatures_1.42.2  
##  [58] backports_1.2.1          httpuv_1.5.5             Hmisc_4.5-0             
##  [61] tools_4.0.3              usethis_2.0.1            bookdown_0.21           
##  [64] ellipsis_0.3.1           jquerylib_0.1.3          sessioninfo_1.1.1       
##  [67] Rcpp_1.0.6               plyr_1.8.6               base64enc_0.1-3         
##  [70] doMC_1.3.7               progress_1.2.2           zlibbioc_1.36.0         
##  [73] RCurl_1.98-1.3           ps_1.6.0                 prettyunits_1.1.1       
##  [76] ggpubr_0.4.0             rpart_4.1-15             openssl_1.4.3           
##  [79] cowplot_1.1.1            haven_2.3.1              cluster_2.1.1           
##  [82] fs_1.5.0                 magrittr_2.0.1           magick_2.7.1            
##  [85] data.table_1.14.0        openxlsx_4.2.3           reprex_1.0.0            
##  [88] ProtGenerics_1.22.0      pkgload_1.2.0            evaluate_0.14           
##  [91] hms_1.0.0                mime_0.10                xtable_1.8-4            
##  [94] XML_3.99-0.6             rio_0.5.26               jpeg_0.1-8.1            
##  [97] AUCell_1.12.0            readxl_1.3.1             gridExtra_2.3           
## [100] testthat_3.0.2           compiler_4.0.3           biomaRt_2.46.3          
## [103] crayon_1.4.1             minqa_1.2.4              R.oo_1.24.0             
## [106] htmltools_0.5.1.1        later_1.1.0.1            Formula_1.2-4           
## [109] geneplotter_1.68.0       lubridate_1.7.10         DBI_1.1.1               
## [112] dbplyr_2.1.0             MASS_7.3-53.1            rappdirs_0.3.3          
## [115] boot_1.3-27              Matrix_1.3-2             car_3.0-10              
## [118] cli_2.3.1                R.methodsS3_1.8.1        pkgconfig_2.0.3         
## [121] GenomicAlignments_1.26.0 foreign_0.8-81           xml2_1.3.2              
## [124] roxygen2_7.1.1           annotate_1.68.0          bslib_0.2.4             
## [127] rngtools_1.5             XVector_0.30.0           rvest_1.0.0             
## [130] doRNG_1.8.2              callr_3.5.1              digest_0.6.27           
## [133] graph_1.68.0             Biostrings_2.58.0        rmarkdown_2.7           
## [136] cellranger_1.1.0         htmlTable_2.1.0          GSEABase_1.52.1         
## [139] curl_4.3                 commonmark_1.7           shiny_1.6.0             
## [142] Rsamtools_2.6.0          nloptr_1.2.2.2           jsonlite_1.7.2          
## [145] lifecycle_1.0.0          nlme_3.1-152             carData_3.0-4           
## [148] desc_1.3.0               askpass_1.1              fansi_0.4.2             
## [151] pillar_1.5.1             lattice_0.20-41          fastmap_1.1.0           
## [154] httr_1.4.2               pkgbuild_1.2.0           survival_3.2-10         
## [157] glue_1.4.2               remotes_2.2.0            zip_2.1.1               
## [160] png_0.1-7                bit_4.0.4                sass_0.3.1              
## [163] stringi_1.5.3            blob_1.2.1               latticeExtra_0.6-29     
## [166] memoise_2.0.0

Integrative analysis of CLIP-seq and RNA-seq data with `surf`

2021-03-22

Contents

1 Overview

2 Preparations

3 Quick start

3.1 One task at a call

3.2 Tell SURF about your data

4 The complete SURF workflow

4.1 Step 1: Parse ATR events from genome annotation.

4.2 Step 2: Detect differential regulation (DR) of ATR events

4.3 Step 3: Test the functional association between RBP binding and DR events.

5 Visualization

5.1 Visualize DrSeq results

5.2 Visualize FASeq results

6 Appendix

6.1 Overlap operations

7 Version information

Integrative analysis of CLIP-seq and RNA-seq data with surf

2021-03-22

Contents

1 Overview

2 Preparations

3 Quick start

3.1 One task at a call

3.2 Tell SURF about your data

4 The complete SURF workflow

4.1 Step 1: Parse ATR events from genome annotation.

4.2 Step 2: Detect differential regulation (DR) of ATR events

4.3 Step 3: Test the functional association between RBP binding and DR events.

4.4 Step 4: RBP-related discovery using external transcriptome data

5 Visualization

5.1 Visualize DrSeq results

5.2 Visualize FASeq results

6 Appendix

6.1 Overlap operations

7 Version information

Integrative analysis of CLIP-seq and RNA-seq data with `surf`