CRAVAT: Cancer-Related Analysis of VAriants Toolkit

Introduction

The Cancer-Related Analysis of Variants Toolkit (CRAVAT) is an evolving web-based suite of informatics tools for genomic variant interpretation that includes:

variant mapping (genome<->transcripts<->protein sequence<->protein structure)
extensive integrated annotation of variants, genes, and proteins
variant impact scoring, including joint prioritization of all nonsilent variants
structural and mechanistic visualization.

Results from CRAVAT submissions are explored in an interactive, user-friendly web-environment with dynamic filtering and sorting designed to highlight the most informative variants and genes in your study. We provide parallel, high-throughput processing of studies that can include many sequenced samples and millions of variants. CRAVAT jobs can be run on our public web-portal or you can run your own local CRAVAT server as a Docker container. Programmatic interfaces and links to our visualization services enable easy integration with other methods for omics analysis.
Read our most recent publication.

How to Cite

CRAVAT citation:

Masica DL, Douville C, Tokheim C, Bhattacharya R, Kim R, Moad K, Ryan MC, Karchin R (2017) CRAVAT 4: Cancer-Related Analysis of Variants Toolkit. Cancer Res., 77(21):e35-e38
Douville C, Carter H, Kim R, Niknafs N, Diekhans M, Stenson PD, Cooper DN, Ryan M, Karchin R (2013). CRAVAT: Cancer-Related Analysis of VAriants Toolkit Bioinformatics, 29(5):647-648

CHASM citation:

Wong WC, Kim D, Carter H, Diekhans M, Ryan M, Karchin R (2011). CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer Bioinformatics, 27(15):2147-2148.
Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, Kinzler KW, Vogelstein B, Karchin R (2009) Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations Cancer Res, 69(16):6660-7.

VEST citation:

Douville C, Christopher, Masica DL, Stenson PD, Cooper DN, Gygax DM, Kim R, Ryan M, and Karchin R (2015) Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-indel) Human Mutation, doi: 10.1002/humu.22911.
Carter H, Douville C, Stenson P, Cooper D, Karchin R (2013) Identifying Mendelian disease genes with the Variant Effect Scoring Tool BMC Genomics, 14(Suppl 3):S3.

Submitting an Analysis Job

Jobs are submitted to the CRAVAT web server simply by providing an input set of variants and selecting a few options for the type of analysis you would like. It is best to create a login so that you can access the My Jobs page and use the CRAVAT Interactive Results Viewer.

CRAVAT has a queuing system, with two separate queues for small and large jobs. This is done so that small jobs are not held up behind longer-running jobs. Multiple concurrent jobs are run in both the large and small queues. Currently, small jobs are defined as those with 25,000 or less variants. 25,000 variants will take approximately 1 hour for any analysis assuming a normal server load. When jobs are submitted, you will receive a rough estimate of the processing time. The large jobs queue is capable of handling submissions with millions of variants.

Input

You may input small sets of variants by cutting/pasting them into the textbox of the Input panel. In most cases, you will want to submit your variants by uploading a file. Files must be plain text files (not compressed). All genomic variant positions should be in GRCh38 (hg38) coordinates. If you have GRCh37 (hg19) coordinates, select the hg19 input panel checkbox to liftover the variants to GRCh38. The specific formats we accept are described below.

VCF

CRAVAT supports VCF format v4.0 and above as an input format. VCF files are commonly provided by sequencing centers and various variant and somatic mutation calling software packages. CRAVAT processing uses variant calls and sample identifiers from the VCF file for analysis. If your VCF file contains information on call quality, zygosity, total alternative reads, and total overall reads (Fill in exact optional VCF field identifiers here), then this information will be included in your CRAVAT results. Note that CRAVAT results may include multiple lines of results for a given VCF input line so that lines containing multiple variants and multiple samples can be annotated in detail individually. Note: the ID field in VCF format will become the UID field in CRAVAT format. If a VCF line is split into multiple result lines, the sample name or other distinguishing characteristics will be added to ID.

CRAVAT Format

CRAVAT also supports a very simple text input format with six fields separated by tabs or spaces. Each row represents a variant/sample pair and must begin with a unique identifier or UID. The UID must be in alphanumeric format (no punctuation or symbols allowed). The sixth field is a sample identifier and can be omitted if it is not relevant. Comment lines are allowed. All lines that start with ">", "#", or "!" are ignored as comments.

Genomic-coordinate format (separated by a tab or a space):
```
# UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID (optional)
TR1	chr17	7674188		-	G	T	TCGA-02-0231
TR2	chr10	121520166	-	G	A	TCGA-02-3512
TR3	chr13	48459831	+	C	A	TCGA-02-3532
TR4	chr7	116777451	+	G	T	TCGA-02-1523
TR5	chr7	140753336	-	T	A	TCGA-02-0023
TR6	chr17	39724745	+	G	T	TCGA-02-0252
Ins1	chr17	39724745	+	-	T	TCGA-02-0252
Del1	chr17	39724745	+	A	-	TCGA-02-0252
CSub1	chr2	39644095	+	ATGCT	GA	TCGA-02-0252
```
Position is in 1-based open coordinates. For insertions and deletions, use "-" as the reference base for insertion and "-" as the alternate base for deletion. In the above example, Ins1 a "T" inserted between the 39724745th and the 39724746th bases. Del1 is an "A" at the 39724745th position that is deleted. CSub1 is a complex substitution in which "ATGCT" from the 39644095th to the 39644099th positions is replaced by "GA". If you do not have strand information from your sequencing results, it is likely that they are all reported on the + strand. Make sure that your reported reference base matches the base in the reported position in the hg38 reference sequence (or hg19 if you checked hg19 checkbox).

* The old format for indels, in which the base is specified before the insertion/deletion location is still supported. However, if this old format is used in any row of your input, your entire input will be handled in the old format. The old and new formats cannot be mixed.
HGVS format
CRAVAT also supports HGVS format for input of variants. Currently we accept HGVS Missense (single point) variants and insertions. The HGVS input includes three tab delimited fields: 1. an optional identifier, 2. HGVS format variants, and 3. an optional sample identifier. For example:
```
Var1	NC_000022.10:g.30025797A>T	Sample1
Var2	NC_000022.10:g.40418496T>C	Sample1
Var3	NC_000022.10:g.40419252C>T	Sample1
Var4	NC_000002.10:g.218270043_218270044insG	Sample1
```
Transcript format
Missense variants can be entered in transcript format
```
# UID / Transcript / AA change / Sample ID (optional)
TR1	NM_001126116.1	D127Y	TCGA-02-0231
TR2	NM_001144919.1	R162Q	TCGA-02-3512
TR3	NM_000321.2	Q702K	TCGA-02-3532
TR4	NM_000245.2	A1108S	TCGA-02-1523
TR5	NM_004333.4	V600E	TCGA-02-0023
```
The transcript identifier can be from either NCBI Refseq (NM accessions), CCDS, or Ensembl (ENST accessions). Refseq and CCDS accessions can be specified without version numbers.

CRAVAT is primarily designed for genomic input, and submissions in transcript format are supported but produce a limited set of annotations. Submitting variants in genomic format is recommended.

Analysis Tools

By default, CRAVAT provides variant mapping across genome<->transcripts<->protein sequence<->protein structure, and extensive annotations (link to annotaions documentation below). You can also select results of variant and gene-level scoring algorithms (currently from CHASM v3.1 and VEST v4.0 and/or additional annotations from GeneCards and PubMed.

CHASM (currently v3.1) provides cancer-specific missense variant scores. If CHASM is selected, a list box that allows you to choose a cancer type appears. VEST (currently v4) provides pathogenicity scores for all non-silent variants.

Submit

Enter your email address (if you have logged in you don't need to), and if you want to receive a machine processing-friendly, tab-separated text version of the CRAVAT analysis report in addition to its default Microsoft Excel version, check "Include text reports for machine processing". Then, click "SUBMIT". When all the analyses are complete, an email with reports will be sent to you. If you have logged in you can check the status and history of your jobs at 'My Jobs' page.

User Account

If you have created a CRAVAT user account, the CRAVAT server will track your job submissions and provide both downloadable results and access to an interactive results viewer. When logged in to your account, you can see the status of your jobs and retrieve the results of current or past jobs through "My Jobs" page.

Create a CRAVAT Account:

There are two ways to create a CRAVAT account:

When you submit a job for the first time, CRAVAT will create an account with your email and a temporary password and this account information will be sent to you as a part of the result notification email.
A CRAVAT account can be created by clicking "Log-In" > "Create an account" on the top menu.

Retrieve Your Username

Your username is your email.

Retrieve Your Password

When you create your login, you will setup a challenge question and answer. If you forgot your password, click "Log-In" > "Forgot password?". You will then enter your login (your email) to retrieve your challenge question. Answering your challenge question will reset your password. The new password is displayed below the challenge answer. Use the new password to login and then you can reset your password.

Change Your Password

To change your password, first log in, and then click "My Profile" > "Change password". In the "Change Password" pop-up window, type your current password, your new password, and again your new password. Click "Submit".

My Jobs Page

After having logged in, click "My Jobs" on the top menu to open the My Jobs page in a new browser tab. This page shows statistics in tabular and graphic format for your past and current jobs and the status of each job (success, fail, running, and in-queue). By clicking "Download" in the "Result" column, you can download Excel and/or text result files and by clicking "Explore" you get access to the interactive interface (recommended).

Checking Job Status

Log in to the CRAVAT main page and click "My Jobs". This will open a new browser tab with the following table which shows your submitted jobs.

For completed jobs, if the interactive result viewer is available, the "Explore" icon will show up, and if result files are available for download, the "Download" icon will show up. Clicking "Explore" will open a new tab with the interactive result viewer. Clicking "Download" will start downloading the results files.

Interactive Result Viewer

The interactive result viewer contains five tabs: Summary, Gene, Variant, Noncoding, and Error.

Summary Tab

1) This tells you various information of your job.
2) This area has various widgets that show you summarized statistics on your variants, which include breakdowns of your variants according to coding/noncoding, oncogene/tumor suppressors, sequence ontologies, and sequence ontology and sample, as well as a circular plot of human chromosomes on which your variants are shown (Circos plot), top genes according to mutation rate, CHASM scores, and VEST scores, and Network Data Exchange (NDEX) network hits of your variants.

Gene Tab

1) This tells you the Job Id for the CRAVAT job being viewed in the Interactive Result Viewer.
2) The filter section allows you to change which variants are shown in the variants table. You can move sliders to set the values. You can click 'Hide synonymous' on or off to show or hide synonymous variants. As you change filter conditions, the table on the right will be updated automatically.
3) The columns section can be used to turn on and off the columns show in the variants table. Click the + and - to expand or collapse column groups. Click on the column names to show (dark grey) or hide (light grey) them.
4) Click a column header to sort the table. Shift-clicking columns will do multiple-column sorting.
5) Type in column header search boxes to filter the table for the rows that contain the typed text.
6) Click a gene on the gene table to the left and the variants in study for the gene will show in this table.
7) Click to see Network Data Exchange (NDEX) networks containing the gene
8) If the selected variant or the variants in the selected gene map to any PDB structure or high-quality model structure, this button turns red. Click the red button to see the structural mapping of the variant(s).
9) CHASM and VEST score and p-values are shown like speedometers. The hand shows the score from 0 to 1, and p-values are shown as "p=".
10) Protein domains, the selected variant(s), and the variants from TCGA are shown as a lollipop diagram.
11) Use this to change the tissue for TCGA mutations.
8) Allele frequencies from 1000 Genomes, ESP6500, and gnomAD are shown as bar-meters.
12) Click to download the table content as a tsv file.
13) Known variants from TCGA are shown below the gene bar. TCGA variants that have multiple samples are taller.
14) Protein domains are shown as brown bars. Regions of interest in the protein sequence are shown as red lines or X's below the gene bar.
15) Click to minimize/restore the table.
16) Click to maximize/restore the table.
17) The variant(s) from the study for the gene is shown above the bar.

Color	Sequence ontology of variant
	Synonymous mutation
	Missense mutation
	Inframe indel
	Frameshift indel
	Splice site mutation
	Stop loss mutation
	Stop gain mutation
	Complex substitution mutation

Variant Tab

1) This tells you the Job Id for the CRAVAT job being viewed in the Interactive Result Viewer.
2) The filter section allows you to change which variants are shown in the variants table. You can move sliders to set the values. You can click 'Hide synonymous' on or off to show or hide synonymous variants. As you change filter conditions, the table on the right will be updated automatically.
3) The columns section can be used to turn on and off the columns show in the variants table. Click the + and - to expand or collapse column groups. Click on the column names to show (dark grey) or hide (light grey) them.
4) Click a column header to sort the table. Shift-clicking columns will do multiple-column sorting.
5) Type in column header search boxes to filter the table for the rows that contain the typed text.
6) This is the table area which shows the analysis result for the chosen tab and filter conditions.
7) Click to change the level of detail (summary or full).
8) Allele frequencies from 1000 Genomes, ESP6500, and gnomAD are shown as bar-meters.
9) If the selected variant or the variants in the selected gene map to any PDB structure or high-quality model structure, this button turns red. Click the red button to see the structural mapping of the variant(s).
10) Protein domains, the selected variant(s), and the variants from TCGA are shown as a lollipop diagram.
11) Use this to change the tissue for TCGA mutations.
12) Click to download the table content as a tsv file.
13) Known variants from TCGA are shown below the gene bar. TCGA variants that have multiple samples are taller.
14) Protein domains are shown as brown bars. Regions of interest in the protein sequence are shown as red lines or X's below the gene bar.
15) Click to minimize/restore the table.
16) Click to maximize/restore the table.
17) The variant(s) from the study for the gene is shown above the bar.

Color	Sequence ontology of variant
	Synonymous mutation
	Missense mutation
	Inframe indel
	Frameshift indel
	Splice site mutation
	Stop loss mutation
	Stop gain mutation
	Complex substitution mutation

Non-Coding Tab

Non-coding tab contains a subset of the elements of the variant tab. Please see the explanation on Variant Tab.

Error Tab

1) This tells you the job ID.
2) Error column shows the reason of each error input line. Input Line column shows the input line
3) Click to download the table content as a tsv file.
4) Click to minimize/restore the table.
5) Click to maximize/restore the table.

Downloadable Results

Upon a successful submission and analysis, you will receive a link to your results via email (if you have logged in you can check the status and history of your jobs at 'My Jobs' page, where you can also download your result by clicking 'Download' in the 'Result' column), which will be available for 30 days from the date of submission. The results will be delivered as one zip-compressed file containing several report files, including a MS Excel format spreadsheet and optional tab-separated text files. Five reports are included: Variant, Variant Additional Details, Variant non-coding, Gene-level analysis, and Input Errors. The spreadsheet has a tab for each report, and the tab-separated text files have each report a separate .tsv file. All reports are in table format. The Excel spreadsheet file is provided only when 65,000 or fewer variants are analyzed.

CRAVAT Galaxy Tool

There are two Galaxy Tools for querying CRAVAT.

To directly obtain annotations for your variants, use
https://toolshed.g2.bx.psu.edu/view/in_silico/cravat_annotate_mutations/1b6e23f3cb06.

To submit a job with your variants to benefit the full CRAVAT analysis environment including its Interactive Result Browser, use
https://toolshed.g2.bx.psu.edu/view/in_silico/cravat_score_and_annotate/cdd97b06c802.

CRAVAT input format is used.

Programmatic Access

With CRAVAT's web service, you can submit and check the status of your jobs without using a browser.

Jobs

Job submission via POST

URL: http://www.cravat.us/CRAVAT/rest/service/submit
Method: POST
Consumes: Multipart/form-data
Produces: a JSON object, notable fields of which are as follows.
- status: "submitted" for successful job submission, "submissonfailed" for an error in the job submission
- errormsg: If there was any error during the job submission, the error message is written here.
- jobid: The Job ID of the submitted job. This job ID can be used to check the status of the job later using "status" method which is explained below.
Form data parameters (* = essential parameters):
- analyses: "CHASM", "VEST", "CHASM;VEST"
- chasmclassifier: classifier name for CHASM analysis
- *email: email of the submitter
- functionalannotation: "on" or "off". GeneCards and PubMed annotation.
- hg19: "on" or "off". Input mutations are in hg19 coordinates or not.
- *inputfile: Input mutation file. This is from the file input element in the POST form.
- mupitinput: "on" or "off". MuPIT input format returned or not.
- tsvreport: "on" or "off". Text format reports returned or not.
Python example
    >import requests
    >r=requests.post('http://www.cravat.us/CRAVAT/rest/service/submit',
        files={'inputfile':open('input_file/vcf_input.txt')},
        data={'email':'test@test.com','analyses':'CHASM'})
    >r.text # contains the submission result as a string. Check "status" field.
Job submission via GET

URL: http://www.cravat.us/CRAVAT/rest/service/submit
Method: GET
Produces: a JSON object, notable fields of which are as follows.
- status: "submitted" for successful job submission, "submissonfailed" for an error in the job submission
- errormsg: If there was any error during the job submission, the error message is written here.
- jobid: The Job ID of the submitted job. This job ID can be used to check the status of the job later using "status" method which is explained below.
Query parameters (* = essential parameters):
- analyses: "CHASM", "VEST", "CHASM;VEST"
- chasmclassifier: classifier name for CHASM analysis
- *email: email of the submitter
- functionalannotation: "on" or "off". GeneCards and PubMed annotation.
- hg19: "on" or "off". Input mutations are in hg19 coordinates or not.
- *mutations: a string with mutations, the format of which is the same as described in the "Input" section above.
- mupitinput: "on" or "off". MuPIT input format returned or not.
- tsvreport: "on" or "off". Text format reports returned or not.
Python example
    >import requests
    >r=requests.get('http://www.cravat.us/CRAVAT/rest/service/submit',
        params={'email':'test@test.com', 'analyses':'',
        'mutations':'TR1 chr22 30025797 + A T sample_1'})
    >r.text # contains the submission result as a string. Check "status" field.
Job status checking

URL: http://www.cravat.us/CRAVAT/rest/service/status
Method: GET
Produces: a JSON object, notable fields of which are as follows.
- status: "running" for still running, "success" for successful completion, "jobfailed" for failed
- errormsg: Error message if the job failed.
- resultfileurl: If the job completed successfully, the URL of the result file.
Query parameters (* = essential parameters):
- *jobid: The job ID to query.
Example
http://www.cravat.us/CRAVAT/rest/service/status?jobid=test_20140204_102423
Python example
    >import requests
    >r=requests.get('http://www.cravat.us/CRAVAT/rest/service/status',
        params={'jobid':'test_20170315_103245'})
    >r.text # contains the job status as a string.

Single Variant

Single variant web API

URL:http://www.cravat.us/CRAVAT/rest/service/query
Method: GET
Produces: a JSON object, notable fields of which are as follows. See Annotation Description for explanation on the returned fields.
Query parameters (* = essential parameters):
- *mutation: The chromsome, position, strand direction, reference base and alternate base of the variant separated by underscores (chomosome_position_strand_refBase_altBase)
Example
http://www.cravat.us/CRAVAT/rest/service/query?mutation=chr22_40418496_-_A_G
Python example
    >import requests
    >r=requests.get('http://www.cravat.us/CRAVAT/rest/service/query',
        params={'mutation':'chr22_30025797_+_A_T'}) # query mutation is of the format chromosome_position_strand_reference_alternate.
    >r.text # contains the annotation for the query mutation.

Single Variant Page

It is possible to display the details of a single variant as a full page on a browser window. Use the following URL scheme to open a Single Variant Page for the variant encoded in the URL.

URL: http://www.cravat.us/CRAVAT/variant.html?variant=CHROM_POSITION_STRAND_REFBASE_ALTBASE
- CHROM: chromosome (e.g chr1)
- POSITION: position in the chromosome
- STRAND: the strand where the variant exists (+ or -)
- REFBASE: reference base(s)
- ALTBASE: alternate base(s)
Example: http://www.cravat.us/CRAVAT/variant.html?variant=chr7_140753336_+_A_T

CHASM-3.1

CHASM-3.1 is the most recent version of "Cancer-specific High-throughput Annotation of Somatic Mutations" a method that predicts the functional significance of somatic missense mutations observed in the genomes of cancer cells, allowing mutations to be prioritized in subsequent functional studies, based on the probability that they give the cells a selective survival advantage. Original CHASM publication CHASM-3 overview

CHASM-3.1 was trained on SNVBox (v5.0) (updated and rebuilt for GRCh38) with driver training examples previously used in CHASM-3 and passenger training examples generated according to dinucleotide frequencies in 25 Cancer Genome Atlas cancer types.

Cancer-specific classifiers

Name	Full name
Bladder	Bladder Urothelial Carcinoma
Blood-Lymphocyte	Chronic Lymphocytic Leukemia
Blood-Myeloid	Acute Myeloid Leukemia
Brain-Glioblastoma-Multiforme	Glioblastoma Multiforme
Brain-Lower-Grade-Glioma	Brain Lower Grade Glioma
Breast	Breast Invasive Carcinoma
Cervix	Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma
Colon	Colon Adenocarcinoma
Head and Neck	Head and Neck Squamous Cell Carcinoma
Kidney-Chromophobe	Kidney Chromophobe
Kidney-Clear-Cell	Kidney Renal Clear Cell Carcinoma
Kidney-Papillary-Cell	Kidney Renal Papillary Cell Carcinoma
Liver-Nonviral	Hepatocellular Carcinoma (Secondary to Alcohol and Adiposity)
Liver-Viral	Hepatocellular Carcinoma (Viral)
Lung-Adenocarcinoma	Lung Adenocarcinoma
Lung-Squamous Cell	Lung Squamous Cell Carcinoma
Other	General purpose
Ovary	Ovarian Serous Cystadenocarcinoma
Pancreas	Pancreatic Cancer

Prostate-Adenocarcinoma	Prostate Adenocarcinoma
Rectum	Rectum Adenocarcinoma
Skin	Skin Cutaneous Melanoma
Stomach	Stomach Adenocarcinoma
Thyroid	Thyroid Carcinoma
Uterus	Uterine Corpus Endometriod Carcinoma

VEST-4

VEST-4 is the most recent version of the Variant Effect Scoring Tool. VEST is a machine learning method that predicts the functional significance of non-silent variants based on the probability that they are pathogenic.

Original VEST missense publication. Original VEST insertion/deletion publication.

Changes from VEST3 to VEST-4

SNVBox features were updated and rebuilt for GRCh38 (SNVBox5.0)
Positive class expanded and updated to HGMD (2017.1)
Neutral class changed to ExAC Release 1 (2/2017)
Improved p-value calculations using non-training set negatives from ExAc and ESP6500 to seed a Gibbs Sampler algorithm. This technique produced a table of p-values with increased precision for all VEST scores for each sequence ontology.

Annotations

Field	Description
Chromosome	Chromosome
Position	Genomic position in chromosomal coordinates
Strand	Positive or negative
Reference base(s)	Base(s) at position in the reference genome (hg38)
Alternate base(s)	Alternate base(s)
Sample ID	Alphanumeric identifier of sample
S.O. all transcripts	Sequence ontology for each transcript the variant is mapped to. * = transcript with most severe sequence ontology.
S.O. transcript	Transcript with most severe sequence ontology. To break ties, the longer transcript is chosen.
S.O. transcript strand	The strand (+ or -) of the transcript used to assign sequence ontology
Protein sequence change	Protein sequence change produced by the variant.
Phred	Phred-scaled quality score Available only with VCF input.
VCF filters	Status of VCF filters. PASS if the all are satisfied. Otherwise, a semicolon-separated list of codes for filters that fail (e.g. "q10;s50"). Available only with VCF input.
Zygosity	Homozygous or heterozygous status of the variant. Available only with VCF input.
Alternate reads	Count of reads with alternate allele aligned to the position. Available only with VCF input.
Total reads	Count of all reads aligned to the position. Available only with VCF input.
Variant allele frequency	Alternate reads / Total reads
ClinVar	ClinVar annotation of pathogenicity. Only the variants with "pathogenic" clinincal significance are reported.
dbSNP	dbSNP identifier
1000 Genomes AF	Healthy population allele frequency from the 1000 Genomes project
ESP6500 AF (average)	Average population allele frequency from Exome Sequencing Project’s ESP6500
ESP6500 AF (European American)	Population-specific AF
ESP6500 AF (African American)	Population-specific AF
gnomAD total AF	Total healthy population allele frequency from the Broad Institute gnomAD project
gnomAD AF African	African/African American specific AF
gnomAD AF American	Admixed American/Latino specific AF
gnomAD AF Ashkenazi Jewish	Ashkenazi Jewish specific AF
gnomAD AF East Asian	East Asian specific AF
gnomAD AF Finnish	Finnish specific AF
gnomAD AF Non-Finnish European	Non-Finnish European specific AF
gnomAD AF South Asian	South Asian specific AF
gnomAD AF Other	Other population specific AF
COSMIC variant count	Total of alt base(s) previously observed in COSMIC database
COSMIC variant count (tissue)	Alt base(s) previously observed in COSMIC database, grouped by tissue
COSMIC transcript	Transcript used by COSMIC curators
COSMIC protein change	Protein sequence change used by COSMIC curators
Number of samples with variant	Alt base recurrence in submitted samples
Protein 3D Variant	If the variant is mapped to a 3D protein structure, this link opens an interactive visualization window from MuPIT.
ClinVar disease identifier Identifier	ClinVar disease identifier
ClinVar XRef	Cross-references for ClinVar annotations
HGVS Genomic	Human Genome Variation Society genomic nomenclature of variant
HGVS Protein	Human Genome Variation Society protein nomenclature of variant in the most damaging S.O. transcript
HGVS Protein All	Human Genome Variation Society protein nomenclature for all transcripts
HUGO Symbol	Gene symbol from HUGO in which the mutation resides
TARGET	Drugs that target the gene from TARGET database
CGL Driver Class	Oncogene or Tumor suppressor gene annotated by Cancer Gene Landscapes
Number of samples with gene mutated	Gene mutation recurrence in submitted samples
CGC driver class	Oncogene or Tumor suppressor gene annotated by Cancer Gene Census
CGC inheritance	Somatic or germline annotated by Cancer Gene Census
CGC tumor types somatic	Annotations from Cancer Gene Census tumor types somatic
CGC tumor types germline	Annotations from Cancer Gene Census tumor types germline
COSMIC gene count	Total times gene is mutated in COSMIC (add link) database
COSMIC gene count (tissue)	Total times gene is mutated in COSMIC (add link) database, grouped by tissue
Protein 3D Gene	If variants in the gene are mapped to a 3D protein structure, this link opens an interactive visualization window from MuPIT.
NCI Pathway Hits	Count of genes from the National Cancer Institute Pathway Interaction Database that contain the mutated gene. From Network Data Exchange (NDEX) enrichment service.
NCI Pathway IDs	Network Data Exchange (NDEX) identifiers of pathways from the National Cancer Institute Pathway Interaction Database that contain the mutated gene.
NCI Pathway Names	Names of pathways from the National Cancer Institute Pathway Interaction Database that contain the mutated gene. From Network Data Exchange (NDEX) enrichment service.
UTR/Intron	Mapping to noncoding regions (UTRs, 2k upsteam/downsteam regions, and introns). From UCSC GRCh38 database.
UTR/Intron Gene	Gene in which a non-coding variant occurred
UTR/Intron All Transcript	Mapping to noncoding regions for each transcript
ncRNA Class	Noncoding RNA class
ncRNA Name	Noncoding RNA name
Repeat Class	Repeated sequence class
Repeat Family	Repeated sequence family
Repeat Name	Repeated sequence name
Pseudogene	Pseudogene a variant occurred in
Pseudogene Transcript	Transcript of noncoding pseudogene a variant occurred in
GWAS NHBLI Key (GRASP)	GRASP-GWAS NHBLI keys. From GRASP project.
GWAS PMID (GRASP)	List of PubMed Ids for associated GRASP phenotypes. List order matches GWAS NHBLI Key.
GWAS Phenotype (GRASP)	List of phenotypes in GRASP catalogue with associated p-values. List order matches GWAS NHBLI Key.

Scoring Algorithm Results

CHASM-3.1

Field	Description
CHASM score	Score for somatic missense variants. Ranges from 0 (likely passenger) to 1 (likely driver). *In the original CHASM paper the values were reversed, with 0 as likely driver and 1 as likely passenger.
CHASM P-value	Empirical p-value (probability that passenger variant is misclassified as a driver).
CHASM FDR	False discovery rate expected (Benjamini-Hochberg multiple testing correction).
CHASM transcript	Transcript used for CHASM score
All transcripts CHASM results	List formatted as TranscriptID:ProteinSequenceChange:CHASMscore:CHASMp-value calculated for all transcripts. *= transcript with most severe CHASM score
CHASM gene score	Highest CHASM score in the gene (only missense variants are considered)
CHASM gene P-value	Composite p-value for non-silent variants in the gene combined with Stouffer’s Z-score method
CHASM gene FDR	Composite false discovery rate (Benjamini-Hochberg multiple testing correction) for non-silent variants in the gene combined with Stouffer’s Z-score method.

VEST-4 Still using VEST-3? Available on hg19.cravat.us.

Field	Description
VEST Score (missense)	Pathogenicity score for missense germline variants
VEST Score (frameshift indels)	Pathogenicity score for frameshift insertion and deletion germline variants
VEST Score (inframe indels)	Pathogenicity score for in-frame insertion and deletion germline variants
VEST Score (stop-gain)	Pathogenicity score for stop-gain germline variants
VEST Score (stop-loss)	Pathogenicity score for stop-loss germline variants
VEST Score (splice site)	Pathogenicity score for splice site germline variants
VEST P-value	Empirical p-value (probability that benign variant is misclassified as pathogenic).
VEST FDR	False discovery rate expected (Benjamini-Hochberg multiple testing correction).
All transcripts VEST results	List formatted as TranscriptID:ProteinSequenceChange:VESTscore:VESTp-value calculated for all transcripts. *= transcript with most severe VEST score.
VEST gene score (non-silent)	Highest pathogenicity score in the gene (all non-silent variants are considered)
VEST gene P-value	Composite p-value for non-silent variants in the gene combined with Stouffer’s Z-score method
VEST gene FDR	Composite false discovery rate (Benjamini-Hochberg multiple testing correction) for non-silent variants in the gene combined with Stouffer’s Z-score method.

Additional Annotations That Can Be Requested

Field	Description
GeneCards summary	GeneCards annotation
PubMed article count	Number of the records retrieved from PubMed, using the name of the gene which contains the mutation and "cancer" as keywords.
PubMed search term	PubMed search result link.