RNA-seq ChIP-seq ChromHMM References

RNA-seq data

Experimental design

For RNA-seq analysis, cell suspensions from CRC PDOs, primary normal and tumor tissues were lysed in TRIzol reagent (Thermo Fisher) and processed for total RNA extraction with PureLink™ RNA Mini Kit (Thermo Fisher), according to manufacturer’s instructions.
PDOs samples were collected at early (<5 splits) and late passages (>5 splits).
The RNA quality was assessed by the RNA Integrity Number (RIN) value with RNA6000 assay (Agilent). Only samples with RIN > 7.0 were used in this study.

Library preparation

RNA-seq libraries were constructed according to the TruSeq mRNA Stranded preparation kit (Illumina, San Diego, USA)

Data processing

Sequencing libraries were loaded on an llumina HiSeq2500.

The quality control of the reads were performed with FastQC v0.11.7 1 and MultiQCv1.5. The reads were trimmed using BBDuk2.
Then aligned to the human hg38 reference (GENCODE Release 25 basic gene annotation) using STARv2.5.3a3. Quantification was performed using featureCounts-Subreadv1.6.2.
The normalized coverage track were generated using bamCoverage (command line --normalizeTo1x 3049315783 --minMappingQuality 10) function of deeptools. Separate tracks for forward and reverse transcripts were generated for each independent sample.

Raw data repository
E-MTAB-8448
File format explanation

bw_file_load This button allows to load the .bw file which contains the signal track for transcripts that originated from the forward strand.

bed_file_load This button allows to load the .bw file which contains the signal track for transcripts that originated from the reverse strand.

ChIP-seq data

Experimental design

Organoids and normal crypts cell pellets were fixed in 1% Formaldheyde – PBS solution. PBS-washed cell pellets were lysed in the presence of protease inhibitors, and sheared at 300-500 bp, using Covaris® M220 focused-ultrasonicator. Sonicated chromatin was incubated with histone marks antibodies (H3K27Ac abcam 4729; H3K4me3 Millipore 07-473; H3K4Me1 DIAGENODE C15410194; H3K36me3 DIAGENODE C15410192; H3K27me3 07449 Millipore). Immunocomplexes were recovered with blocked 10 ul Protein G-Dynabeads (Thermo Fisher) and washed. The immunoprecipitated DNA was then purified by Qiagen MinElute kit (Qiagen).

Library preparation
ChIP-seq libraries were prepared according to TruSeq ChIP Library Preparation Kit from Illumina.
Data processing

Sequencing libraries were loaded on an llumina HiSeq2500.

The quality control of the reads were performed with FastQC v0.11.7 1 and MultiQCv1.5.
The reads were aligned to the human hg38 reference (GENCODE Release 25 basic gene annotation) using bowtiev1.2.2, sorted using SAMtoolsv1.8 and directly converted into binary files (BAM). PCR duplicates reads were marked and removed using SAMtoolsv1.8.
The peaks were called with MACS2v2.1.0 using matched input DNA as a control and appropriate options for sharp and broad histone modifications. Peaks overlapping ENCODE blacklisted regions, found in un-placed and un-localized scaffolds were removed.
For the visualization of ChIP-seq tracks, Bedgraph tracks were generated using MACS2 bdgcmp function, converted into bigwig using UCSC tools bedClip and bedGraphToBigWig functions.

Raw data repository
E-MTAB-8416
File format explanation

bw_file_load This button allow to load the .bw file which contain the signal track for each sample. There is one track for each modification in each PDO.

bed_file_load This button allow to load the .narrowPeak and .broadPeak files which contain the peak locations for each sample. There is one track for each modification in each PDO.

ChromHMM

Data processing

De-novo chromatin stated characterization of all PDO was performed using a multivariate Hidden Markov Model approach (ChromHMM v1.1218) considering 5 histone modifications (H3K4me3, H3K27Ac, H3K4me1, H3K36me3,H3K27me3) across 10 PDOs and public available data (Table S2), using default parameters.
The datasets were down-sampled to a maximum depth of 45 million reads (the median read depth over all samples in Table S2). The reads count for all the considered samples, were computed in non-overlapping 200-bp bins across the entire genome. The binarization was performed comparing ChIP-seq read count to corresponding input DNA as control to reduce the technical noise.
The 8-state model was chosen for downstream analysis since it captured the key interaction between histone marks and because it was the model with minimal redundancy.

Emission model description

The names attributed for the annotation of each states were chosen according the Roadmap Epigenomics Consortium nomenclature (Roadmap Epigenomics Consortium et al. 2015).
Briefly, two states were annotated as promoter states (“Flanking Active TSS - FlnkActTSS” and “Active TSS - ActTSS”) based on the presence of H3K4me3, or the enrichment of both H3K4me3 and H3K27ac, respectively.
The two states with strong enrichment of H3K4me1 and H3K27ac and absence of H3K4me3 were defined as “Flanking Active Enhancers - FlnkActEnh” and “Active Enhancers - ActEnh”.
The state characterized by the presence of H3K4me1 alone was defined as “Weak Enhancers - WkEnh”.
The “Elongation – Elong” and “Repression - Repr” states were characterized by the presence of H3K36me3 and H3K27me3, respectively.
“Quiescence” state marks regions without any significant enrichment of histone marks.

Emission model
File format explanation

chromHMM_file_load This button allow to load the chromHMM tracks which contains the chromatin state segmentation for each patient. Chromatin segments are color-coded according to the colors reported in the heatmap above.

References

  • James T. Robinson et al. Integrative Genomics Viewer. Nature Biotechnology 29 (2011) IGV, IGV (gitHub)
  • Andrews, S. FastQC: A quality control tool for high throughput sequence data. (2010). FastQC
  • BBMap Guide - DOE Joint Genome Institute. BBMap
  • Dobin, A. STAR manual 2.5.0a. (2015). STAR
  • Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160-5 (2016). DeepTools
  • Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). Bowtie
  • Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 (2009). SAMtools
  • Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9 R137 (2008). MACS
  • Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9 (2012). ChromHMM