CRAVAT: Cancer-Related Analysis of VAriants Toolkit

Introduction


The Cancer-Related Analysis of Variants Toolkit (CRAVAT) is an evolving web-based suite of informatics tools for genomic variant interpretation that includes:
  • variant mapping (genome<->transcripts<->protein sequence<->protein structure)
  • extensive integrated annotation of variants, genes, and proteins
  • variant impact scoring, including joint prioritization of all nonsilent variants
  • structural and mechanistic visualization.
Results from CRAVAT submissions are explored in an interactive, user-friendly web-environment with dynamic filtering and sorting designed to highlight the most informative variants and genes in your study. We provide parallel, high-throughput processing of studies that can include many sequenced samples and millions of variants. CRAVAT jobs can be run on our public web-portal or you can run your own local CRAVAT server as a Docker container. Programmatic interfaces and links to our visualization services enable easy integration with other methods for omics analysis.
Read our most recent publication.


How to Cite


CRAVAT citation:
  • Masica DL, Douville C, Tokheim C, Bhattacharya R, Kim R, Moad K, Ryan MC, Karchin R (2017) CRAVAT 4: Cancer-Related Analysis of Variants Toolkit. bioRxiv DOI.
  • Douville C, Carter H, Kim R, Niknafs N, Diekhans M, Stenson PD, Cooper DN, Ryan M, Karchin R (2013). CRAVAT: Cancer-Related Analysis of VAriants Toolkit Bioinformatics, 29(5):647-648
CHASM citation:
  • Wong WC, Kim D, Carter H, Diekhans M, Ryan M, Karchin R (2011). CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer Bioinformatics, 27(15):2147-2148.
  • Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, Kinzler KW, Vogelstein B, Karchin R (2009) Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations Cancer Res, 69(16):6660-7.
VEST citation:
  • Douville C, Christopher, Masica DL, Stenson PD, Cooper DN, Gygax DM, Kim R, Ryan M, and Karchin R (2015) Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-indel) Human Mutation, doi: 10.1002/humu.22911.
  • Carter H, Douville C, Stenson P, Cooper D, Karchin R (2013) Identifying Mendelian disease genes with the Variant Effect Scoring Tool BMC Genomics, 14(Suppl 3):S3.

Submitting an Analysis Job


Jobs are submitted to the CRAVAT web server simply by providing an input set of variants and selecting a few options for the type of analysis you would like. It is best to create a login so that you can access the My Jobs page and use the CRAVAT Interactive Results Viewer.

CRAVAT has a queuing system, with two separate queues for small and large jobs. This is done so that small jobs are not held up behind longer-running jobs. Multiple concurrent jobs are run in both the large and small queues. Currently, small jobs are defined as those with 25,000 or less variants. 25,000 variants will take approximately 1 hour for any analysis assuming a normal server load. When jobs are submitted, you will receive a rough estimate of the processing time. The large jobs queue is capable of handling submissions with millions of variants.

Input


You may input small sets of variants by cutting/pasting them into the textbox of the Input panel. In most cases, you will want to submit your variants by uploading a file. Files must be plain text files (not compressed). All genomic variant positions should be in GRCh38 (hg38) coordinates. If you have GRCh37 (hg19) coordinates, select the hg19 input panel checkbox to liftover the variants to GRCh38. The specific formats we accept are described below.

VCF


CRAVAT supports VCF format v4.0 and above as an input format. VCF files are commonly provided by sequencing centers and various variant and somatic mutation calling software packages. CRAVAT processing uses variant calls and sample identifiers from the VCF file for analysis. If your VCF file contains information on call quality, zygosity, total alternative reads, and total overall reads (Fill in exact optional VCF field identifiers here), then this information will be included in your CRAVAT results. Note that CRAVAT results may include multiple lines of results for a given VCF input line so that lines containing multiple variants and multiple samples can be annotated in detail individually. Note: the ID field in VCF format will become the UID field in CRAVAT format. If a VCF line is split into multiple result lines, the sample name or other distinguishing characteristics will be added to ID.

CRAVAT Format


CRAVAT also supports a very simple text input format with six fields separated by tabs or spaces. Each row represents a variant/sample pair and must begin with a unique identifier or UID. The UID must be in alphanumeric format (no punctuation or symbols allowed). The sixth field is a sample identifier and can be omitted if it is not relevant. Comment lines are allowed. All lines that start with ">", "#", or "!" are ignored as comments.
  • Genomic-coordinate format (separated by a tab or a space):
    # UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID (optional)
    TR1	chr17	7674188		-	G	T	TCGA-02-0231
    TR2	chr10	121520166	-	G	A	TCGA-02-3512
    TR3	chr13	48459831	+	C	A	TCGA-02-3532
    TR4	chr7	116777451	+	G	T	TCGA-02-1523
    TR5	chr7	140753336	-	T	A	TCGA-02-0023
    TR6	chr17	39724745	+	G	T	TCGA-02-0252
    Ins1	chr17	39724745	+	-	T	TCGA-02-0252
    Del1	chr17	39724745	+	A	-	TCGA-02-0252
    CSub1	chr2	39644095	+	ATGCT	GA	TCGA-02-0252
    
    Position is in 1-based open coordinates. For insertions and deletions, use "-" as the reference base for insertion and "-" as the alternate base for deletion. In the above example, Ins1 a "T" inserted between the 37880997th and the 37880998th bases. Del1 is an "A" at the 37880998th position that is deleted. CSub1 is a complex substitution in which "ATGCT" from the 39871235th to the 39871239th positions is replaced by "GA". If you do not have strand information from your sequencing results, it is likely that they are all reported on the + strand. Make sure that your reported reference base matches the base in the reported position in the hg38 reference sequence (or hg19 if you checked hg19 checkbox).

    * The old format for indels, in which the base is specified before the insertion/deletion location is still supported. However, if this old format is used in any row of your input, your entire input will be handled in the old format. The old and new formats cannot be mixed.

  • HGVS format
    CRAVAT also supports HGVS format for input of variants. Currently we accept HGVS Missense (single point) variants and insertions. The HGVS input includes three tab delimited fields: 1. an optional identifier, 2. HGVS format variants, and 3. an optional sample identifier. For example:
    Var1	NC_000022.10:g.30025797A>T	Sample1
    Var2	NC_000022.10:g.40418496T>C	Sample1
    Var3	NC_000022.10:g.40419252C>T	Sample1
    Var4	NC_000002.10:g.218270043_218270044insG	Sample1
    
  • Transcript format
    Missense variants can be entered in transcript format
    # UID / Transcript / AA change / Sample ID (optional)
    TR1	NM_001126116.1	D127Y	TCGA-02-0231
    TR2	NM_001144919.1	R162Q	TCGA-02-3512
    TR3	NM_000321.2	Q702K	TCGA-02-3532
    TR4	NM_000245.2	A1108S	TCGA-02-1523
    TR5	NM_004333.4	V600E	TCGA-02-0023
    
    The transcript identifier can be from either NCBI Refseq (NM accessions), CCDS, or Ensembl (ENST accessions). Refseq and CCDS accessions can be specified without version numbers.

    CRAVAT is primarily designed for genomic input, and submissions in transcript format are supported but produce a limited set of annotations. Submitting variants in genomic format is recommended.

Analysis Tools


By default, CRAVAT provides variant mapping across genome<->transcripts<->protein sequence<->protein structure, and extensive annotations (link to annotaions documentation below). You can also select results of variant and gene-level scoring algorithms (currently from CHASM v3.1 and VEST v4.0 and/or additional annotations from GeneCards and PubMed.

CHASM (currently v3.1) provides cancer-specific missense variant scores. If CHASM is selected, a list box that allows you to choose a cancer type appears. VEST (currently v4) provides pathogenicity scores for all non-silent variants.

Submit


Enter your email address (if you have logged in you don't need to), and if you want to receive a machine processing-friendly, tab-separated text version of the CRAVAT analysis report in addition to its default Microsoft Excel version, check "Include text reports for machine processing". Then, click "SUBMIT". When all the analyses are complete, an email with reports will be sent to you. If you have logged in you can check the status and history of your jobs at 'My Jobs' page.

User Account


If you have created a CRAVAT user account, the CRAVAT server will track your job submissions and provide both downloadable results and access to an interactive results viewer. When logged in to your account, you can see the status of your jobs and retrieve the results of current or past jobs through "My Jobs" page.

Create a CRAVAT Account:

There are two ways to create a CRAVAT account:
  • When you submit a job for the first time, CRAVAT will create an account with your email and a temporary password and this account information will be sent to you as a part of the result notification email.
  • A CRAVAT account can be created by clicking "Log-In" > "Create an account" on the top menu.

Retrieve Your Username

Your username is your email.

Retrieve Your Password

When you create your login, you will setup a challenge question and answer. If you forgot your password, click "Log-In" > "Forgot password?". You will then enter your login (your email) to retrieve your challenge question. Answering your challenge question will reset your password. The new password is displayed below the challenge answer. Use the new password to login and then you can reset your password.

Change Your Password

To change your password, first log in, and then click "My Profile" > "Change password". In the "Change Password" pop-up window, type your current password, your new password, and again your new password. Click "Submit".

My Jobs Page

After having logged in, click "My Jobs" on the top menu to open the My Jobs page in a new browser tab. This page shows statistics in tabular and graphic format for your past and current jobs and the status of each job (success, fail, running, and in-queue). By clicking "Download" in the "Result" column, you can download Excel and/or text result files and by clicking "Explore" you get access to the interactive interface (recommended).

Checking Job Status


Log in to the CRAVAT main page and click "My Jobs". This will open a new browser tab with the following table which shows your submitted jobs.



For completed jobs, if the interactive result viewer is available, the "Explore" icon will show up, and if result files are available for download, the "Download" icon will show up. Clicking "Explore" will open a new tab with the interactive result viewer. Clicking "Download" will start downloading the results files.

Interactive Result Viewer


The interactive result viewer contains five tabs: Summary, Gene, Variant, Noncoding, and Error.

Summary Tab



1) This tells you various information of your job.
2) This area has various widgets that show you summarized statistics on your variants, which include breakdowns of your variants according to coding/noncoding, oncogene/tumor suppressors, sequence ontologies, and sequence ontology and sample, as well as a circular plot of human chromosomes on which your variants are shown (Circos plot), top genes according to mutation rate, CHASM scores, and VEST scores, and Network Data Exchange (NDEX) network hits of your variants.

Gene Tab



1) This tells you the Job Id for the CRAVAT job being viewed in the Interactive Result Viewer.
2) The filter section allows you to change which variants are shown in the variants table. You can move sliders to set the values. You can click 'Hide synonymous' on or off to show or hide synonymous variants. As you change filter conditions, the table on the right will be updated automatically.
3) The columns section can be used to turn on and off the columns show in the variants table. Click the + and - to expand or collapse column groups. Click on the column names to show (dark grey) or hide (light grey) them.
4) Click a column header to sort the table. Shift-clicking columns will do multiple-column sorting.
5) Type in column header search boxes to filter the table for the rows that contain the typed text.
6) Click a gene on the gene table to the left and the variants in study for the gene will show in this table.
7) Click to see Network Data Exchange (NDEX) networks containing the gene
8) If the selected variant or the variants in the selected gene map to any PDB structure or high-quality model structure, this button turns red. Click the red button to see the structural mapping of the variant(s).
9) CHASM and VEST score and p-values are shown like speedometers. The hand shows the score from 0 to 1, and p-values are shown as "p=".
10) Protein domains, the selected variant(s), and the variants from TCGA are shown as a lollipop diagram.
11) Use this to change the tissue for TCGA mutations.
8) Allele frequencies from 1000 Genomes, ESP6500, and gnomAD are shown as bar-meters.
12) Click to download the table content as a tsv file.
13) Known variants from TCGA are shown below the gene bar. TCGA variants that have multiple samples are taller.
14) Protein domains are shown as brown bars. Regions of interest in the protein sequence are shown as red lines or X's below the gene bar.
15) Click to minimize/restore the table.
16) Click to maximize/restore the table.
17) The variant(s) from the study for the gene is shown above the bar.
Color Sequence ontology of variant
Synonymous mutation
Missense mutation
Inframe indel
Frameshift indel
Splice site mutation
Stop loss mutation
Stop gain mutation
Complex substitution mutation

Variant Tab



1) This tells you the Job Id for the CRAVAT job being viewed in the Interactive Result Viewer.
2) The filter section allows you to change which variants are shown in the variants table. You can move sliders to set the values. You can click 'Hide synonymous' on or off to show or hide synonymous variants. As you change filter conditions, the table on the right will be updated automatically.
3) The columns section can be used to turn on and off the columns show in the variants table. Click the + and - to expand or collapse column groups. Click on the column names to show (dark grey) or hide (light grey) them.
4) Click a column header to sort the table. Shift-clicking columns will do multiple-column sorting.
5) Type in column header search boxes to filter the table for the rows that contain the typed text.
6) This is the table area which shows the analysis result for the chosen tab and filter conditions.
7) Click to change the level of detail (summary or full).
8) Allele frequencies from 1000 Genomes, ESP6500, and gnomAD are shown as bar-meters.
9) If the selected variant or the variants in the selected gene map to any PDB structure or high-quality model structure, this button turns red. Click the red button to see the structural mapping of the variant(s).
10) Protein domains, the selected variant(s), and the variants from TCGA are shown as a lollipop diagram.
11) Use this to change the tissue for TCGA mutations.
12) Click to download the table content as a tsv file.
13) Known variants from TCGA are shown below the gene bar. TCGA variants that have multiple samples are taller.
14) Protein domains are shown as brown bars. Regions of interest in the protein sequence are shown as red lines or X's below the gene bar.
15) Click to minimize/restore the table.
16) Click to maximize/restore the table.
17) The variant(s) from the study for the gene is shown above the bar.
Color Sequence ontology of variant
Synonymous mutation
Missense mutation
Inframe indel
Frameshift indel
Splice site mutation
Stop loss mutation
Stop gain mutation
Complex substitution mutation

Non-Coding Tab


Non-coding tab contains a subset of the elements of the variant tab. Please see the explanation on Variant Tab.

Error Tab



1) This tells you the job ID.
2) Error column shows the reason of each error input line. Input Line column shows the input line
3) Click to download the table content as a tsv file.
4) Click to minimize/restore the table.
5) Click to maximize/restore the table.

Downloadable Results


Upon a successful submission and analysis, you will receive a link to your results via email (if you have logged in you can check the status and history of your jobs at 'My Jobs' page, where you can also download your result by clicking 'Download' in the 'Result' column), which will be available for 30 days from the date of submission. The results will be delivered as one zip-compressed file containing several report files, including a MS Excel format spreadsheet and optional tab-separated text files. Five reports are included: Variant, Variant Additional Details, Variant non-coding, Gene-level analysis, and Input Errors. The spreadsheet has a tab for each report, and the tab-separated text files have each report a separate .tsv file. All reports are in table format. The Excel spreadsheet file is provided only when 65,000 or fewer variants are analyzed.

CRAVAT Galaxy Tool


There are two Galaxy Tools for querying CRAVAT.

To directly obtain annotations for your variants, use
https://toolshed.g2.bx.psu.edu/view/in_silico/cravat_annotate_mutations/1b6e23f3cb06.

To submit a job with your variants to benefit the full CRAVAT analysis environment including its Interactive Result Browser, use
https://toolshed.g2.bx.psu.edu/view/in_silico/cravat_score_and_annotate/cdd97b06c802.

CRAVAT input format is used.

Programmatic Access


With CRAVAT's web service, you can submit and check the status of your jobs without using a browser.

Jobs

  • Job submission via POST

    URL: http://www.cravat.us/CRAVAT/rest/service/submit
    Method: POST
    Consumes: Multipart/form-data
    Produces: a JSON object, notable fields of which are as follows.
    • status: "submitted" for successful job submission, "submissonfailed" for an error in the job submission
    • errormsg: If there was any error during the job submission, the error message is written here.
    • jobid: The Job ID of the submitted job. This job ID can be used to check the status of the job later using "status" method which is explained below.
    Form data parameters (* = essential parameters):
    • analyses: "CHASM", "VEST", "CHASM;VEST"
    • chasmclassifier: classifier name for CHASM analysis
    • *email: email of the submitter
    • functionalannotation: "on" or "off". GeneCards and PubMed annotation.
    • hg19: "on" or "off". Input mutations are in hg19 coordinates or not.
    • *inputfile: Input mutation file. This is from the file input element in the POST form.
    • mupitinput: "on" or "off". MuPIT input format returned or not.
    • tsvreport: "on" or "off". Text format reports returned or not.
    Python example
        >import requests
        >r=requests.post('http://www.cravat.us/CRAVAT/rest/service/submit',
            files={'inputfile':open('input_file/vcf_input.txt')},
            data={'email':'test@test.com','analyses':'CHASM'})
        >r.text # contains the submission result as a string. Check "status" field.

  • Job submission via GET

    URL: http://www.cravat.us/CRAVAT/rest/service/submit
    Method: GET
    Produces: a JSON object, notable fields of which are as follows.
    • status: "submitted" for successful job submission, "submissonfailed" for an error in the job submission
    • errormsg: If there was any error during the job submission, the error message is written here.
    • jobid: The Job ID of the submitted job. This job ID can be used to check the status of the job later using "status" method which is explained below.
    Query parameters (* = essential parameters):
    • analyses: "CHASM", "VEST", "CHASM;VEST"
    • chasmclassifier: classifier name for CHASM analysis
    • *email: email of the submitter
    • functionalannotation: "on" or "off". GeneCards and PubMed annotation.
    • hg19: "on" or "off". Input mutations are in hg19 coordinates or not.
    • *mutations: a string with mutations, the format of which is the same as described in the "Input" section above.
    • mupitinput: "on" or "off". MuPIT input format returned or not.
    • tsvreport: "on" or "off". Text format reports returned or not.
    Python example
        >import requests
        >r=requests.get('http://www.cravat.us/CRAVAT/rest/service/submit',
            params={'email':'test@test.com', 'analyses':'',
            'mutations':'TR1 chr22 30025797 + A T sample_1'})
        >r.text # contains the submission result as a string. Check "status" field.

  • Job status checking

    URL: http://www.cravat.us/CRAVAT/rest/service/status
    Method: GET
    Produces: a JSON object, notable fields of which are as follows.
    • status: "running" for still running, "success" for successful completion, "jobfailed" for failed
    • errormsg: Error message if the job failed.
    • resultfileurl: If the job completed successfully, the URL of the result file.
    Query parameters (* = essential parameters):
    • *jobid: The job ID to query.
    Example
    http://www.cravat.us/CRAVAT/rest/service/status?jobid=test_20140204_102423
    Python example
        >import requests
        >r=requests.get('http://www.cravat.us/rest/service/status',
            params={'jobid':'test_20170315_103245'})
        >r.text # contains the job status as a string.

Single Variant

  • Single variant web API

    URL:http://www.cravat.us/CRAVAT/rest/service/query
    Method: GET
    Produces: a JSON object, notable fields of which are as follows. See Annotation Description for explanation on the returned fields.
    Query parameters (* = essential parameters):
    • *mutation: The chromsome, position, strand direction, reference base and alternate base of the variant separated by underscores (chomosome_position_strand_refBase_altBase)
    Example
    http://www.cravat.us/CRAVAT/rest/service/query?mutation=chr22_40418496_-_A_G
    Python example
        >import requests
        >r=requests.get('http://www.cravat.us/rest/service/query',
            params={'mutation':'chr22_30025797_+_A_T'}) # query mutation is of the format chromosome_position_strand_reference_alternate.
        >r.text # contains the annotation for the query mutation.

Single Variant Page


It is possible to display the details of a single variant as a full page on a browser window. Use the following URL scheme to open a Single Variant Page for the variant encoded in the URL.

CHASM-3.1


CHASM-3.1 is the most recent version of "Cancer-specific High-throughput Annotation of Somatic Mutations" a method that predicts the functional significance of somatic missense mutations observed in the genomes of cancer cells, allowing mutations to be prioritized in subsequent functional studies, based on the probability that they give the cells a selective survival advantage. Original CHASM publication CHASM-3 overview

CHASM-3.1 was trained on SNVBox (v5.0) (updated and rebuilt for GRCh38) with driver training examples previously used in CHASM-3 and passenger training examples generated according to dinucleotide frequencies in 25 Cancer Genome Atlas cancer types.

Cancer-specific classifiers

Name Full name
Bladder Bladder Urothelial Carcinoma
Blood-Lymphocyte Chronic Lymphocytic Leukemia
Blood-Myeloid Acute Myeloid Leukemia
Brain-Glioblastoma-Multiforme Glioblastoma Multiforme
Brain-Lower-Grade-Glioma Brain Lower Grade Glioma
Breast Breast Invasive Carcinoma
Cervix Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma
Colon Colon Adenocarcinoma
Head and Neck Head and Neck Squamous Cell Carcinoma
Kidney-Chromophobe Kidney Chromophobe
Kidney-Clear-Cell Kidney Renal Clear Cell Carcinoma
Kidney-Papillary-Cell Kidney Renal Papillary Cell Carcinoma
Liver-Nonviral Hepatocellular Carcinoma (Secondary to Alcohol and Adiposity)
Liver-Viral Hepatocellular Carcinoma (Viral)
Lung-Adenocarcinoma Lung Adenocarcinoma
Lung-Squamous Cell Lung Squamous Cell Carcinoma
Other General purpose
Ovary Ovarian Serous Cystadenocarcinoma
Pancreas Pancreatic Cancer
Prostate-Adenocarcinoma Prostate Adenocarcinoma
Rectum Rectum Adenocarcinoma
Skin Skin Cutaneous Melanoma
Stomach Stomach Adenocarcinoma
Thyroid Thyroid Carcinoma
Uterus Uterine Corpus Endometriod Carcinoma

VEST-4


VEST-4 is the most recent version of the Variant Effect Scoring Tool. VEST is a machine learning method that predicts the functional significance of non-silent variants based on the probability that they are pathogenic.

Original VEST missense publication. Original VEST insertion/deletion publication.

Changes from VEST3 to VEST-4
  • SNVBox features were updated and rebuilt for GRCh38 (SNVBox5.0)
  • Positive class expanded and updated to HGMD (2017.1)
  • Neutral class changed to ExAC Release 1 (2/2017)
  • Improved p-value calculations using non-training set negatives from ExAc and ESP6500 to seed a Gibbs Sampler algorithm. This technique produced a table of p-values with increased precision for all VEST scores for each sequence ontology.

Annotations


Field Description
Chromosome Chromosome
Position Genomic position in chromosomal coordinates
Strand Positive or negative
Reference base(s) Base(s) at position in the reference genome (hg38)
Alternate base(s) Alternate base(s)
Sample ID Alphanumeric identifier of sample
Sequence ontology Code for consequence type of the mutation/variant in Sequence Ontology (S.O) annotation.

Code Definition
FIFrameshift Insertion
FDFrameshift Deletion
SGStop Gained
SSSplice Site
SLStop Lost
IIInframe Insertion
IDInframe Deletion
CSComplex Substitution
MSMissense Variant
SYSynonymous Variant
UNUnknown

Sequence Ontology Details.

If multiple transcript mappings produce several S.O. codes, the most severe is reported, in order of FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
S.O. all transcripts Sequence ontology for each transcript the variant is mapped to. * = transcript with most severe sequence ontology.
S.O. transcript Transcript with most severe sequence ontology. To break ties, the longer transcript is chosen.
S.O. transcript strand The strand (+ or -) of the transcript used to assign sequence ontology
Protein sequence change Protein sequence change produced by the variant.
Phred Phred-scaled quality score Available only with VCF input.
VCF filters Status of VCF filters. PASS if the all are satisfied. Otherwise, a semicolon-separated list of codes for filters that fail (e.g. "q10;s50"). Available only with VCF input.
Zygosity Homozygous or heterozygous status of the variant. Available only with VCF input.
Alternate reads Count of reads with alternate allele aligned to the position. Available only with VCF input.
Total readsCount of all reads aligned to the position. Available only with VCF input.
Variant allele frequencyAlternate reads / Total reads
ClinVarClinVar annotation of pathogenicity. Only the variants with "pathogenic" clinincal significance are reported.
dbSNPdbSNP identifier
1000 Genomes AFHealthy population allele frequency from the 1000 Genomes project
ESP6500 AF (average)Average population allele frequency from Exome Sequencing Project’s ESP6500
ESP6500 AF (European American)Population-specific AF
ESP6500 AF (African American)Population-specific AF
gnomAD total AFTotal healthy population allele frequency from the Broad Institute gnomAD project
gnomAD AF AfricanAfrican/African American specific AF
gnomAD AF AmericanAdmixed American/Latino specific AF
gnomAD AF Ashkenazi JewishAshkenazi Jewish specific AF
gnomAD AF East AsianEast Asian specific AF
gnomAD AF FinnishFinnish specific AF
gnomAD AF Non-Finnish EuropeanNon-Finnish European specific AF
gnomAD AF South AsianSouth Asian specific AF
gnomAD AF OtherOther population specific AF
COSMIC variant countTotal of alt base(s) previously observed in COSMIC database
COSMIC variant count (tissue)Alt base(s) previously observed in COSMIC database, grouped by tissue
COSMIC transcriptTranscript used by COSMIC curators
COSMIC protein changeProtein sequence change used by COSMIC curators
Number of samples with variantAlt base recurrence in submitted samples
Protein 3D VariantIf the variant is mapped to a 3D protein structure, this link opens an interactive visualization window from MuPIT.
ClinVar disease identifier IdentifierClinVar disease identifier
ClinVar XRefCross-references for ClinVar annotations
HGVS GenomicHuman Genome Variation Society genomic nomenclature of variant
HGVS ProteinHuman Genome Variation Society protein nomenclature of variant in the most damaging S.O. transcript
HGVS Protein AllHuman Genome Variation Society protein nomenclature for all transcripts
HUGO SymbolGene symbol from HUGO in which the mutation resides
TARGETDrugs that target the gene from TARGET database
CGL Driver ClassOncogene or Tumor suppressor gene annotated by Cancer Gene Landscapes
Number of samples with gene mutatedGene mutation recurrence in submitted samples
CGC driver classOncogene or Tumor suppressor gene annotated by Cancer Gene Census
CGC inheritanceSomatic or germline annotated by Cancer Gene Census
CGC tumor types somaticAnnotations from Cancer Gene Census tumor types somatic
CGC tumor types germlineAnnotations from Cancer Gene Census tumor types germline
COSMIC gene countTotal times gene is mutated in COSMIC (add link) database
COSMIC gene count (tissue)Total times gene is mutated in COSMIC (add link) database, grouped by tissue
Protein 3D GeneIf variants in the gene are mapped to a 3D protein structure, this link opens an interactive visualization window from MuPIT.
NCI Pathway HitsCount of genes from the National Cancer Institute Pathway Interaction Database that contain the mutated gene. From Network Data Exchange (NDEX) enrichment service.
NCI Pathway IDsNetwork Data Exchange (NDEX) identifiers of pathways from the National Cancer Institute Pathway Interaction Database that contain the mutated gene.
NCI Pathway NamesNames of pathways from the National Cancer Institute Pathway Interaction Database that contain the mutated gene. From Network Data Exchange (NDEX) enrichment service.
UTR/IntronMapping to noncoding regions (UTRs, 2k upsteam/downsteam regions, and introns). From UCSC GRCh38 database.
UTR/Intron GeneGene in which a non-coding variant occurred
UTR/Intron All TranscriptMapping to noncoding regions for each transcript
ncRNA ClassNoncoding RNA class
ncRNA NameNoncoding RNA name
Repeat ClassRepeated sequence class
Repeat FamilyRepeated sequence family
Repeat NameRepeated sequence name
PseudogenePseudogene a variant occurred in
Pseudogene TranscriptTranscript of noncoding pseudogene a variant occurred in
GWAS NHBLI Key (GRASP)GRASP-GWAS NHBLI keys. From GRASP project.
GWAS PMID (GRASP)List of PubMed Ids for associated GRASP phenotypes. List order matches GWAS NHBLI Key.
GWAS Phenotype (GRASP)List of phenotypes in GRASP catalogue with associated p-values. List order matches GWAS NHBLI Key.

Scoring Algorithm Results


CHASM-3.1

Field Description
CHASM scoreScore for somatic missense variants. Ranges from 0 (likely passenger) to 1 (likely driver). *In the original CHASM paper the values were reversed, with 0 as likely driver and 1 as likely passenger.
CHASM P-valueEmpirical p-value (probability that passenger variant is misclassified as a driver).
CHASM FDRFalse discovery rate expected (Benjamini-Hochberg multiple testing correction).
CHASM transcriptTranscript used for CHASM score
All transcripts CHASM resultsList formatted as TranscriptID:ProteinSequenceChange:CHASMscore:CHASMp-value calculated for all transcripts. *= transcript with most severe CHASM score
CHASM gene scoreHighest CHASM score in the gene (only missense variants are considered)
CHASM gene P-valueComposite p-value for non-silent variants in the gene combined with Stouffer’s Z-score method
CHASM gene FDRComposite false discovery rate (Benjamini-Hochberg multiple testing correction) for non-silent variants in the gene combined with Stouffer’s Z-score method.

VEST-4 Still using VEST-3? Available on hg19.cravat.us.

Field Description
VEST Score (missense)Pathogenicity score for missense germline variants
VEST Score (frameshift indels)Pathogenicity score for frameshift insertion and deletion germline variants
VEST Score (inframe indels)Pathogenicity score for in-frame insertion and deletion germline variants
VEST Score (stop-gain)Pathogenicity score for stop-gain germline variants
VEST Score (stop-loss)Pathogenicity score for stop-loss germline variants
VEST Score (splice site)Pathogenicity score for splice site germline variants
VEST P-valueEmpirical p-value (probability that benign variant is misclassified as pathogenic).
VEST FDRFalse discovery rate expected (Benjamini-Hochberg multiple testing correction).
All transcripts VEST resultsList formatted as TranscriptID:ProteinSequenceChange:VESTscore:VESTp-value calculated for all transcripts. *= transcript with most severe VEST score.
VEST gene score (non-silent)Highest pathogenicity score in the gene (all non-silent variants are considered)
VEST gene P-valueComposite p-value for non-silent variants in the gene combined with Stouffer’s Z-score method
VEST gene FDRComposite false discovery rate (Benjamini-Hochberg multiple testing correction) for non-silent variants in the gene combined with Stouffer’s Z-score method.

Additional Annotations That Can Be Requested


Field Description
GeneCards summaryGeneCards annotation
PubMed article countNumber of the records retrieved from PubMed, using the name of the gene which contains the mutation and "cancer" as keywords.
PubMed search termPubMed search result link.