#================================================================== # README for BMI summary statistics files for GHS and MCPS cohorts #================================================================== This document describes the output, column content, and definitions used for BMI results in GHS and MCPS. #----------------------------------------------- # Criteria #----------------------------------------------- Each cohort specific file contains all imputed and exome-sequenced single variants results meeting the following criteria: -Minor allele frequency > 1% -p-value < 5e-5 -Exome sequencing results were used instead of imputed results for variants captured by both exome sequencing and imputation. Each cohort specific file contains all exome-sequencing based gene burden results meeting the following criteria: -For each gene, the gene burden with the smallest association p-value out of 20 gene buden types (M1, M2, M3, M4 variant selection types; and 5 [5%], 1 [1%], 01 [0.1%], 001 [0.01%], singleton alternative allele frequencies (AAF) cutoff for variant inclusion; for a total of 20 possible gene burden types for each gene) is selected. -If more than one gene burden result have the same p-value, results are reported in the following order of decreasing priority M1 > M2 > M3 > M4. Once the definition is selected, the gene burden with the most liberal AAF threshold is selected. -p-value < 5e-5 -Minor allele count >= 100 #----------------------------------------------- # Summary statistics files #----------------------------------------------- File suffix: *_BMI_summarystatistics.csv.gz Column headers: genetic_exposure: For single variant sites, the column reports {Chromosome}:{Base pair position}:{Reference allele}:{Alternative allele} - all genomic coordinates are based on the hg38 reference genome sequence; for gene burden results the column reports {Gene name}_{Gene burden definition}_{Maximum AAF threshold for sites included in the gene burden}. Pval: BMI association p-value. #----------------------------------------------- # Gene burden definitions #----------------------------------------------- M1: predicted loss-of-function (pLOF) variants only M2: pLOF and missense variants M3: pLOF and predicted deleterious missense variants (5/5 in silico algorithms predict a deleterious variant) M4: pLOF and predicted deleterious missense variants (at least 1/5 in silico algorithms predict a deleterious variant) e.g AURKC_M3_001 is the gene-burden result for a gene burden made up of pLOF and and predicted deleterious missense variants (5/5 algorithms predict a deleterious variant) in the AURKC gene where the AAF for included variants in the gene burden is < 0.01%