FANTOM5

2019/03/29

Web Site: http://fantom.gsc.riken.jp/5/
HTTPS Site: https://dbarchive.biosciencedbc.jp/data/fantom5/

The database on activities of transcripts and transcription factors in various cell species of human and mouse.

README Content

  1. Database Component
  2. Data Description
  3. License
  4. Update History
  5. Literature
  6. Contact address

1. Database Component

  1. README
  2. HeliscopeCAGE sequencing, Delve mapping and CAGE TSS aggregation
  3. CAGE peaks
  4. Pathway enrichment and co-expression cluster analysis
  5. Enhancers
  6. Results of de-novo and Motif activity analyses
  7. CAGE_peaks_annotation
  8. Sample ontology, GOstat and ontology term enrichment
  9. CAGE peaks identified as true TSS by TSS classifier
  10. (reprocessed)HeliscopeCAGE sequencing, Delve mapping and CAGE TSS aggregation
  11. (reprocessed)CAGE peaks
  12. (reprocessed)CAGE_peaks_annotation
  13. (reprocessed)CAGE_peaks_expression
  14. (reprocessed)pooled_ctss
  15. DRA_accession_tables
  16. Gene_level_expression
  17. Summary_CAGEScan
  18. (reprocessed) Enhancers
  19. (reprocessed)DPI_clustering
Return to Top

2. Data Description

2.1 README

Data name README
Description of data contents HTML file to describe "FANTOM5" data.
File README_e.html (English)
Return to Top

2.2 HeliscopeCAGE sequencing, Delve mapping and CAGE TSS aggregation

Data name HeliscopeCAGE sequencing, Delve mapping and CAGE TSS aggregation
Description of data contents

Time course and snapshot data by HeliScopeCAGE
"basic" directory contains the following:

 

Subdirectories with "CAGEScan" in end (only human)

00_.assay_sdrf.txt:
Experimental information for each sample (tab-delimited text)
*.bam:
Binary read mapping results (BAM format)
*.bam.bai:
Indexes of BAM files
*.3prime.fq.gz:
3' sequences in CAGEscan tags (FASTQ format)
*.5prime.fq.gz:
5' sequences in CAGEscan tags (FASTQ format)
*.clusters.bed.gz
Clustering results with CAGEscan (standard BED12 format). 4th column shows the CAGE tag name that represents the cluster and 5th column shows the number of pairs that constitute the cluster.
*.pairs.bed.gz
Read pairs mapped with CAGEscan (standard BED12 format). 4th column shows the pair's name and 5th column shows the total mapping quality value of reads.

 

Subdirectories expect above

00_.assay_sdrf.txt:
Experimental information for each sample (tab-delimited text)
*.bam:
Binary read mapping results (BAM format)
*.bam.bai:
Indexes of BAM files
*.ctss.bed.gz:
TSS (CTSS) identified by CAGE tag analysis (BED format)
*.rdna.fa.gz:
rDNA sequences (FASTA format)
File fantom5_new_experimental_details.zip (273 KB)
basic (2.5 TB)

Data items are the following:
Data itemDescription
Extract name Internal ID
FF ontology Ontology ID in FANTOM
Description Description
Catalog ID RNA catalog ID
Category Category
Species Species
Sex Sex
Age Age
Developmental stage Developmental stage
Tissue Tissue
Cell lot Cell lot
Cell type Cell type
Catalogue ID Cell catalogue ID
Collaboration Collaboration
Provider Cell provider
Extraction protocol RNA extraction protocol
Material type RNA Material type
RNA tube RNA tube ID (the same as Extact name)
Sample name Sample name
RNA extraction RNA extraction
RNA ID Internal RNA ID
Comment on RNA Comment on RNA
ratio_260/230 Ratio (260nm/230nm) of the sample
ratio_260/280 Ratio (260nm/280nm) of the sample
Concentration RNA concentration
RNA Integrity number RNA Integrity number
lsid Sample group ID
Library protocol Library protocol
Library ID Library ID
Sequence protocol Sequence protocol
Machine name Machine name
Run name Run name
Flowcell channel Flowcell channel
Alignment protocol Alignment protocol
BAM file Read mapping results (BAM format)
BAI file Index of BAM file
CAGE TSS file TSS identified by CAGE analysis (BED format)
Ribosomal DNA sequence file rDNA sequence file (FASTA format)
Barcode Barcode sequence for RNA sample identification
Return to Top

2.3 CAGE peaks

Data name CAGE peaks
Description of data contents

Data about CAGE peak regions and RNA transcriptional initiation activities measured by CAGE

File CAGE_peaks (4.1 GB)
Return to Top

2.4 Pathway enrichment and co-expression cluster analysis

Data name Pathway enrichment and co-expression cluster analysis
Description of data contents

Pathway enrichment and co-expression cluster analysis

File Co-expression_clusters (86 MB)
Return to Top

2.5 Enhancers

Data name Enhancers
Description of data contents

Human and mouse enhancers identified by measuring RNA transcripts amount with CAGE in phase1.0 and phase2.0

File Enhancers (160 MB)
Return to Top

2.6 Results of de-novo and Motif activity analyses

Data name Results of de-novo and Motif activity analyses
Description of data contents

Analysis results of TFBS motif near TSS

  • de-novo motif analysis with HOMER etc.
  • Significance of the correlation between transcriptional activations mesured by CAGE and de-novo motif/known TFBS motif (registered in JASPAR)
File Motifs (6.2 GB)
Return to Top

2.7 CAGE_peaks_annotation

Data name CAGE_peaks_annotation
Description of data contents

Annotation of human and mouse CAGE peaks and RNA transcriptional initiation activities by CAGE

File CAGE_peaks_annotation (195 MB)
Return to Top

2.8 Sample ontology, GOstat and ontology term enrichment

Data name Sample ontology, GOstat and ontology term enrichment
Description of data contents

The ontology to express samples in phase2.0
It is based on Cell Ontology, Disease Ontology and Pan-vertebrate Uberon Ontology. The file format is OBO.

File Ontology (1.8 MB)
Return to Top

2.9 CAGE peaks identified as true TSS by TSS classifier

Data name CAGE peaks identified as true TSS by TSS classifier
Description of data contents

Evaluation of CAGE peaks
It shows that the sequence close to CAGE peak has "TSS-like" or not. "TSS_human.bed.gz" and "TSS_mouse.bed.gz" contain identified TSS.

File TSS_classifier (32 MB)
Return to Top

2.10 (reprocessed)HeliscopeCAGE sequencing, Delve mapping and CAGE TSS aggregation

Data name (reprocessed)HeliscopeCAGE sequencing, Delve mapping and CAGE TSS aggregation
Description of data contents

Time course and snapshot data by HeliScopeCAGE
Human and mouse reads are re-mapped to new reference genome sequences (hg38/mm10).
"basic" directory contains the following:

 

00_.assay_sdrf.txt:
Experimental information for each sample (tabbed text)
*.bam:
Binary read mapping results (BAM format)
*.bam.bai:
Indexes of BAM files
*.ctss.bed.gz:
TSS (CTSS) identified by CAGE tag analysis (BED format)
*.rdna.fa.gz:
rDNA sequences (FASTA format)
File fantom5_rp_exp_details.zip (237 KB)
(reprocessed)basic (Homo sapiens) (1.4 TB)
(reprocessed)basic (Mus musculus) (889 GB)

Data items are the following:
Data itemDescription
Extract name Internal ID
FF ontology Ontology ID in FANTOM
Description Description
Catalog ID RNA catalog ID
Category Category
Species Species
Sex Sex
Age Age
Developmental stage Developmental stage
Tissue Tissue
Cell lot Cell lot
Cell type Cell type
Catalogue ID Cell catalogue ID
Collaboration Collaboration
Provider Cell provider
Extraction protocol RNA extraction protocol
Material type RNA Material type
RNA tube RNA tube ID (the same as Extact name)
Sample name Sample name
RNA extraction RNA extraction
RNA ID Internal RNA ID
Comment on RNA Comment on RNA
ratio_260/230 Ratio (260nm/230nm) of the sample
ratio_260/280 Ratio (260nm/280nm) of the sample
Concentration RNA concentration
lsid RNA Integrity number
Library protocol Library protocol
Library ID Library ID
Sequence protocol Sequence protocol
Machine name Machine name
Run name Run name
Flowcell channel Flowcell channel
Alignment protocol Alignment protocol
BAM file Read mapping results (BAM format)
BAI file Index of BAM file
CAGE TSS file TSS identified by CAGE analysis (BED format)
Ribosomal DNA sequence file rDNA sequence file (FASTA format)
Return to Top

2.11 (reprocessed)CAGE peaks

Data name (reprocessed)CAGE peaks
Description of data contents

Data about CAGE peak regions and RNA transcriptional initiation activities measured by CAGE. Human and mouse reads are re-mapped to new reference genome sequences (hg38/mm10).

File (reprocessed)CAGE_peaks (Homo sapiens) (11 MB)
(reprocessed)CAGE_peaks (Mus musculus) (8.1 MB)
Return to Top

2.12 (reprocessed)CAGE_peaks_annotation

Data name (reprocessed)CAGE_peaks_annotation
Description of data contents

Annotation of human and mouse CAGE peaks and RNA transcriptional initiation activities by CAGE
Human and mouse reads are re-mapped for new reference sequences (hg38/mm10).

File (reprocessed)CAGE_peaks_annotation (Homo sapiens) (19 MB)
(reprocessed)CAGE_peaks_annotation (Mus musculus) (14 MB)
Return to Top

2.13 (reprocessed)CAGE_peaks_expression

Data name (reprocessed)CAGE_peaks_expression
Description of data contents

Annotation of human and mouse CAGE peaks and RNA transcriptional initiation activities by CAGE
Human and mouse reads are re-mapped for new reference sequences (hg38/mm10).

File (reprocessed)CAGE_peaks_annotation (Homo sapiens) (16 MB)
(reprocessed)CAGE_peaks_annotation (Mus musculus) (13 MB)
Return to Top

2.14 (reprocessed)pooled_ctss

Data name (reprocessed)pooled_ctss
Description of data contents

Mapping results of the all CAGE tags used from phase1.0 to phase2.0
Human and mouse reads are re-mapped for new reference sequences (hg38/mm10).

File

Mapping results of the all CAGE tags used from phase1.0 to phase2.0
Human and mouse reads in phase2.0 are re-mapped for new reference sequences (hg38/mm10).

(6.5 GB)
(reprocessed)pooled_ctss (Mus musculus) (4.5 GB)
Return to Top

2.15 DRA_accession_tables

Data name DRA_accession_tables
Description of data contents

Accession number list of FANTOM5 sample data registered in DRA (http://trace.ddbj.nig.ac.jp/dra/index.html).

File fantom5_dra_accession_tables.zip (64 KB)
DRA_accession_tables (251KB)

Data items are the following:
Data itemDescription
Library ID DNA sequence library ID
FF ontology Ontology ID defined in FANTOM
DRA sample accession number DRA sample accession number
DRA experiment accession number DRA experiment accession number
DRA run accession number DRA run accession number
DRA analysis accession number (BAM) DRA analysis accession number (BAM)
DRA analysis accession number (BED) DRA analysis accession number (BED)
Experiment method Experiment method
Return to Top

2.16 Gene_level_expression

Data name Gene_level_expression
Description of data contents

Those tables shows the number of CAGE tags mapped on the same gene in each sample (human/mouse). The "counts" tables mean simple tabulation of tags and the "tpm" tables mean "TPM" (Transcripts Per Million) with RLE (Relative Log Expression).

File gene_level_expression (415MB)
Return to Top

2.17 Summary_CAGEScan

Data name Summary_CAGEScan
Description of data contents

The summary of CAGEscan experiments assembled in every DNA sequence library

File fantom5_summary_cagescan.zip (4.64 KB)
Summary_CAGEScan (62KB)

Data items are the following:
Data itemDescription
Library ID DNA sequence library ID
Raw Measured reads
Removed by extraction Removed reads by extraction
Extracted Extracted reads by analysis
Removed by artifacts Removed artifatcts
Filtered for artifact Reads filterd for artifacts
Removed by rRNA Removed reads as rRNA
Filtered for rRNA Reads filterd for rRNAs
Non alignment Reads not mapped on genome
Genome mapped Reads mapped on genome
Duplicated Duplicated reads
Uniquely mapped Unique reads
Inproperly mapped pairs Read pairs inproperly mapped on genome
Properly mapped pairs Read pairs properly mapped on genome
Total pairs Total pairs
Exon Pairs on exon
Intergenic Pairs on intergenic
Promoter Pairs on promoter
Return to Top

2.18 (reprocessed) Enhancers

Data name (reprocessed) Enhancers
Description of data contents

Identified enhancers of human and mouse by RNA transcription mesurement with CAGE. They are reprocessed with new reference genome sequences (hg38/mm10).

File (reprocessed) enhancer (Homo sapiens) (101MB)
(reprocessed) enhancer (Mus musculus) (7.3MB)
Return to Top

2.19 (reprocessed)DPI_clustering

Data name (reprocessed)DPI_clustering
Description of data contents

This is the peak identification of human and mouse reprocessed mapping data with DPI (Decomposition-based peak identification) method. (BED format)

*.tc.bed.gz:
CAGE tag clusters with original definition
*.tc.decompose_smoothing_merged.bed.gz:
All peaks with DPI
*.tc.decompose_smoothing_merged.ctssMaxCounts3.bed.gz:
All extracted peaks
*.tc.decompose_smoothing_merged.ctssMaxCounts11_ctssMaxTpm1.bed.gz:
Probable extracted peaks
File (reprocessed)DPI_clustering (Homo sapiens) (150MB)
(reprocessed)DPI_clustering (Mus musculus) (124MB)
Return to Top

3. License

Last updated : 2017/03/14

You may use this database in compliance with the terms and conditions of the license described below. The license specifies the license terms regarding the use of this database and the requirements you must follow in using this database.

 

Creative Commons License

The license for this database is specified in the Creative Commons Attribution 4.0 International.
If you use data from this database, please be sure attribute this database as follows: "FANTOM5 © RIKEN licensed under CC Attribution 4.0 International".

The summary of the Creative Commons Attribution 4.0 International is found here.

With regard to this database, you are licensed to:

  1. freely access part or whole of this database, and acquire data;
  2. freely redistribute part or whole of the data from this database; and
  3. freely create and distribute database and other adapted materials based on part or whole of the data from this database,

under the license, as long as you comply with the following conditions:

  1. You must attribute this database in the manner specified by the author or licensor when distributing part or whole of this database or any adapted material.
  2. You need to contact the Licensor shown below to request a license for use of this database or any part thereof not licensed under the license.

E-mail: fantom-help[at]riken[dot]jp

About Providing Links to This Database

You can freely provide links to all contents in this database. But, contents might be changed without notice.

Return to Top

4. Update History

DateUpdate contents
2019/03/29 Archive V3 is released.
The following data are added. The following data are updated.
2017/03/14 FANTOM5 English archive site is opened.
2016/12/20 Archive V2 is released.
The following data are added. The following data are updated.
2015/12/07 FANTOM5 archive site is opened.
(Archive V1)
2014/03/27 FANTOM5 (http://fantom.gsc.riken.jp/5/) is opened.
Return to Top

5. Literature

FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Lassmann T, Itoh M, Summers KM, Suzuki H, Daub CO, Kawai J, Heutink P, Hide W, Freeman TC, Lenhard B, Bajic VB, Taylor MS, Makeev VJ, Sandelin A, Hume DA, Carninci P, Hayashizaki Y.
A promoter-level mammalian expression atlas.
Nature. 2014 Mar 27;507(7493):462-70. doi: 10.1038/nature13182.
PMID: 24670764

Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T, Ntini E, Arner E, Valen E, Li K, Schwarzfischer L, Glatz D, Raithel J, Lilje B, Rapin N, Bagger FO, Jørgensen M, Andersen PR, Bertin N, Rackham O, Burroughs AM, Baillie JK, Ishizu Y, Shimizu Y, Furuhata E, Maeda S, Negishi Y, Mungall CJ, Meehan TF, Lassmann T, Itoh M, Kawaji H, Kondo N, Kawai J, Lennartsson A, Daub CO, Heutink P, Hume DA, Jensen TH, Suzuki H, Hayashizaki Y, Müller F; FANTOM Consortium, Forrest AR, Carninci P, Rehli M, Sandelin A.
An atlas of active enhancers across human cell types and tissues.
Nature. 2014 Mar 27;507(7493):455-61. doi: 10.1038/nature12787.
PMID: 24670763

Arner E, Daub CO, Vitting-Seerup K, Andersson R, Lilje B, Drabløs F, Lennartsson A, Rönnerblad M, Hrydziuszko O, Vitezic M, Freeman TC, Alhendi AM, Arner P, Axton R, Baillie JK, Beckhouse A, Bodega B, Briggs J, Brombacher F, Davis M, Detmar M, Ehrlund A, Endoh M, Eslami A, Fagiolini M, Fairbairn L, Faulkner GJ, Ferrai C, Fisher ME, Forrester L, Goldowitz D, Guler R, Ha T, Hara M, Herlyn M, Ikawa T, Kai C, Kawamoto H, Khachigian LM, Klinken SP, Kojima S, Koseki H, Klein S, Mejhert N, Miyaguchi K, Mizuno Y, Morimoto M, Morris KJ, Mummery C, Nakachi Y, Ogishima S, Okada-Hatakeyama M, Okazaki Y, Orlando V, Ovchinnikov D, Passier R, Patrikakis M, Pombo A, Qin XY, Roy S, Sato H, Savvi S, Saxena A, Schwegmann A, Sugiyama D, Swoboda R, Tanaka H, Tomoiu A, Winteringham LN, Wolvetang E, Yanagi-Mizuochi C, Yoneda M, Zabierowski S, Zhang P, Abugessaisa I, Bertin N, Diehl AD, Fukuda S, Furuno M, Harshbarger J, Hasegawa A, Hori F, Ishikawa-Kato S, Ishizu Y, Itoh M, Kawashima T, Kojima M, Kondo N, Lizio M, Meehan TF, Mungall CJ, Murata M, Nishiyori-Sueki H, Sahin S, Nagao-Sato S, Severin J, de Hoon MJ, Kawai J, Kasukawa T, Lassmann T, Suzuki H, Kawaji H, Summers KM, Wells C; FANTOM Consortium, Hume DA, Forrest AR, Sandelin A, Carninci P, Hayashizaki Y.
Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells.
Science. 2015 Feb 27;347(6225):1010-4. doi: 10.1126/science.1259418. Epub 2015 Feb 12.
PMID: 25678556

Chung-Chau Hon, Jordan A. Ramilowski, Jayson Harshbarger, Nicolas Bertin, Owen J. L. Rackham, Julian Gough, Elena Denisenko, Sebastian Schmeier, Thomas M. Poulsen, Jessica Severin, Marina Lizio, Hideya Kawaji, Takeya Kasukawa, Masayoshi Itoh, A. Maxwell Burroughs, Shohei Noma, Sarah Djebali, Tanvir Alam, Yulia A. Medvedeva, Alison C. Testa, Leonard Lipovich, Chi-Wai Yip, Imad Abugessaisa, Mickaël Mendez, Akira Hasegawa, Dave Tang, Timo Lassmann, Peter Heutink, Magda Babina, Christine A. Wells, Soichi Kojima, Yukio Nakamura, Harukazu Suzuki, Carsten O. Daub, Michiel J. L. de Hoon, Erik Arner, Yoshihide Hayashizaki, Piero Carninci & Alistair R. R. Forrest
An atlas of human long non-coding RNAs with accurate 5' ends
Nature volume 543, pages 199–204 (09 March 2017)
PMID: 28241135

Derek de Rie, Imad Abugessaisa, Tanvir Alam, Erik Arner, Peter Arner, Haitham Ashoor, Gaby Åström, Magda Babina, Nicolas Bertin, A Maxwell Burroughs, Ailsa J Carlisle, Carsten O Daub, Michael Detmar, Ruslan Deviatiiarov, Alexandre Fort, Claudia Gebhard, Daniel Goldowitz, Sven Guhl, Thomas J Ha, Jayson Harshbarger, Akira Hasegawa, Kosuke Hashimoto, Meenhard Herlyn, Peter Heutink, Kelly J Hitchens, Chung Chau Hon, Edward Huang, Yuri Ishizu, Chieko Kai, Takeya Kasukawa, Peter Klinken, Timo Lassmann, Charles-Henri Lecellier, Weonju Lee, Marina Lizio, Vsevolod Makeev, Anthony Mathelier, Yulia A Medvedeva, Niklas Mejhert, Christopher J Mungall, Shohei Noma, Mitsuhiro Ohshima, Mariko Okada-Hatakeyama, Helena Persson, Patrizia Rizzu, Filip Roudnicky, Pål Sætrom, Hiroki Sato, Jessica Severin, Jay W Shin, Rolf K Swoboda, Hiroshi Tarui, Hiroo Toyoda, Kristoffer Vitting-Seerup, Louise Winteringham, Yoko Yamaguchi, Kayoko Yasuzawa, Misako Yoneda, Noriko Yumoto, Susan Zabierowski, Peter G Zhang, Christine A Wells, Kim M Summers, Hideya Kawaji, Albin Sandelin, Michael Rehli, The FANTOM Consortium, Yoshihide Hayashizaki, Piero Carninci, Alistair R R Forrest & Michiel J L de Hoon
An integrated expression atlas of miRNAs and their promoters in human and mouse
Nature Biotechnology volume 35, pages 872–878 (2017)
PMID: 28829439

Return to Top

6. Contact address

When you have any question about "FANTOM5", contact the following:

E-mail: fantom-help[at]riken[dot]jp

Return to Top