Skip to main content

Data Formats

This page describes file formats used in Chromoscope. To find a list of required and optional files, please refer to the Data Configuration section.

Structural Variants (BEDPE)

The structural variants are stored in a headed BEDPE file. The order of the columns does not need to be in the exact same order. This is a The following columns are used in the browser:

PropertyTypeNote
chrom1stringRequired. The name of the chromosome of the first break point (BP).
start1numberRequired. The starting position of the first BP.
end1numberRequired. The end position of the first BP.
chrom2stringRequired. The name of the chromosome of the second BP.
start2numberRequired. The starting position of the second BP.
end2numberRequired. The end position of the second BP.
sv_idstringRequired. The name of the SV.
pe_supportstringOptional. The number of events that support SV shown in tooltips.
strand1stringRequired. The strand for the first BP. Either '+' or '-'.
strand2stringRequired. The strand for the second BP. Either '+' or '-'.

Example file:

https://somatic-browser-test.s3.amazonaws.com/SVTYPE_SV_test_tumor_normal_with_panel.bedpe

SV Type Mapping Table

In Chromosope, strands are mapped with the following types of SVs.

Inter-chromosomal SV typesstrand1strand2
Deletion+-
Inversion (head-to-head)++
Inversion (tail-to-tail)--
Duplication-+

CNV (TSV)

The CNV is stored in a headed tab-delimited file that is visualized as three tracks: CNV, Gain, and LOH. The order of the columns does not need to be in the exact same order.

PropertyTypeNote
chromosomestringRequired. The name of the chromosome.
startnumberRequired. The starting position.
endnumberRequired. The end position.
total_cnstringRequired. The total number of copies.
major_cnnumberRequired. The major allele counts.
minor_cnnumberRequired. The minor allele counts.

Example file:

https://s3.amazonaws.com/gosling-lang.org/data/SV/7a921087-8e62-4a93-a757-fd8cdbe1eb8f.consensus.20170119.somatic.cna.annotated.txt

Drivers (TSV or JSON)

The drivers are stored in a headed tab-delimited file. When this file is present, the browser will show drivers that are included in the file only.

The order of the columns does not need to be in the exact same order.

PropertyTypeNote
chrstringRequired. The name of the chromosome. The names should contain chr prefix, such as chr2 and chrX.
posnumberRequired. The position of the driver.
genestringRequired. The name of the driver.
refstringOptional. Information only shown on a tooltip.
altstringOptional. Information only shown on a tooltip.
categorystringOptional. Information only shown on a tooltip.
top_categorystringOptional. Information only shown on a tooltip.
transcript_consequencestringOptional. Information only shown on a tooltip.
protein_mutationstringOptional. Information only shown on a tooltip.
allele_fractionstringOptional. Information only shown on a tooltip.
mutation_typestringOptional. Information only shown on a tooltip.
biallelicstringRequired. Either Yes or No. Whether the mutation occurs on both alleles of a single gene.
biallelic
An annotation representing a biallelic mutation.

Based on the biallelic value, the browser shows annotations near the gene name:

  • “⊙” for biallelic when the biallelic column is "yes"
  • “.” for not biallelic (i.e., mono-allelic) when "no"
  • no symbol when undefined

Example file:

https://gist.githubusercontent.com/sehilyi/350b9e633c52ad97df00a0fc13a8839a/raw/c47b9ba33f1c9e187c69d1dadd01838db44d3b29/driver.txt

VCF & TBI

For point mutations and indels, we use standard VCF files along with tabix files. To generate the tabix file, you can run the following command:

tabix myfile.sorted.vcf.gz

Refer to the documentation of Samtools for details (https://www.htslib.org/doc/tabix.html).

caution

The VCF files should be sorted and indexed to be able to make Chromoscope to properly show genomics features.

BAM & BAI

For read alignments, we use standard BAM files along with BAI files. To generate the index file, you can run the following command:

samtools index myfile.sorted.bam.gz myfile.sorted.bam.bai

Refer to the documentation of Samtools for details (https://www.htslib.org/doc/samtools-index.html).

caution

The BAM files should be sorted and indexed to be able to make Chromoscope to properly show genomics features.