**allele**

One of two or more alternative nucleotide sequences at a single gene locus on a chromosome.

** **

**allele
frequency**

Allele frequency is a term in population genetics that is used in characterizing the genetic
diversity of a species population or equivalently the richness of its gene pool. Allele frequency is defined as follows:

Given:

a
particular chromosome locus,

a gene occupying that locus,

a population of individuals carrying loci in each of
their somatic cells (e.g. two loci in the cells of diploid species, which contain two sets of chromosomes), and

a variant
or allele of the gene,

the allele frequency of that allele is the fraction or percentage of loci that the allele occupies
with in a population. For instance, if the frequency of an allele is 20% in a given population, then among population members,
one in five chromosomes carry that allele. Four out of five will be occupied by other variants of the gene, of which there
may be one or many.

Example: If there are ten individuals in a population, and at a given locus, there are two possible
alleles and , and if the genotypes of the individuals are , , , , , , , , , and , then the allele frequencies of allele
and allele are:

P(A) =__ 2+1+2+0+1+2+2+1+1+2 __= 0.7

20

P(a) = __0+1+0+2+1+0+0+1+1+0__ = 0.3

20

**association
studies**

The primary means of establishing an association between a given phenotype and the other covariates,
such as other phenotype data or genotype data.

**attributable fraction**The
proportion of disease occurrence that would potentially be eliminated if exposure were prevented.

**autosome**

A chromosome that is not a sex chromosome. In humans, the autosomal chromosomes are numeric (1...22). The non-autosomal
chromosomes are X and Y.

**base**

A single nucleotide, composed of a nucleobase
(nitrogenous base), a five-carbon sugar, and one to three phosphate groups. Together, the nucleobase and sugar comprise a
nucleoside.

**base pair**

A pair of complimentary nucleotides. In DNA,
the nucleotide adenine (A) always binds with thymine (T), and guanine (G) binds with cytosine (C). In RNA, uracil (U) binds
with adenine, rather than thymine.

**binary trait**

A binary trait
has only two possible values, e.g. presence of the trait versus absence of the trait.

**centromere**

A
region of DNA that binds sister chromatids into a diploid chromosome.

**chromatid**

A complete base pair sequence. A chromatid has a short arm ('p' arm) and a long arm ('q' arm), separated by the centromere,
where the short arm contains fewer bases than the long arm. Each end of the arm furthest from the centromere are telemeres.

**chromosome**

An organized structure of DNA and protein that is found in cells. A diploid chromosome is a pair of sister chromatids bound
by a centromere.

**codon**

A tri-nucleotide sequence associated with a
particular amino acid.

**continuous trait**

A continuous trait is a trait
whose variations are measured with a scale or has a range of variation, rather than classification into categories. Examples
are height, body mass index (BMI), blood pressure, etc.

**covariate**

A
term synonymous with predictor, explanatory variable and independent variable. It is a variable that is either of direct interest
in predicting the response in a study or one that acts as a confounding variable, affecting the relationship between the dependent
variable and the independent variables of primary interest.

**cytoband**

A region of a chromatid that is distinguished from other cytobands by its shade as a result of applying a staining solution.

**diploid**

An organism with a haploid number of 2. Healthy human cells are diploid.

**disease
gene**

A gene that carries or is responsible for a disorder, defect or a disease.

**exon**

The portion of a gene that is ultimately expressed as a protein via mRNA translation and protein transcription.

**gene**

A genomic region composed of exons and introns. Genes represent sequences that are transcribed into RNA, which is transcribed
into proteins.

**genetic model**

The overall specification of how the disease
allele(s) act to the influence the disease. For parametric (model-dependent) linkage analysis, the genetic model must be specified
for analysis. Components of the genetic model include the information on whether the disorder is autosomal or X-linked, dominant
or recessive, the frequency and penetrance of the disease allele, the frequency of the phenocopies and the mutation rate.
A genetic model consists of three main components:

-a model for disease susceptibility, connecting disease phenotypes
to genotypes at disease susceptibility (DS) loci for the sibs;

-a population genetics model, describing the population
joint distribution of genotypes at the DS loci of the parents; and

-a segregation model, describing the segregation of
alleles at the DS loci during meiosis.

**genotype**

May mean the genetic composition (alleles) of
an individual in total, but in Golden Helix SVS, refers to the particular pair of alleles that an individual possesses at
a single gene locus on a chromosome.

**genome**

The collection of chromosomes of a particular
species, including autosomal and non-autosomal.

**genomic position**

A position within a chromosome.
Usually described in the format : (i.e. chr1:50,000). In humans, positions are sorted alphanumerically by chromosome, and
numbering starts at the beginning of the 'p' chromatid.

**genomic region**

A subsequence of
alleles within a chromosome. Usually described in the format of :- (i.e. chr1:50,000-100,000). In humans, regions are sorted
alphanumerically by chromosome, and numbering starts at the beginning of the 'p' chromatid.

**half-open coordinates**

Coordinates are zero-based, and the difference between the stop and start positions define the width of an interval. An
interval covering the first three positions of a chromosome in a half-open system would be specified as [0,3].

**haplotype**

Set of closely linked genetic markers present on one chromosome which tend to be inherited together (not easily separable
by recombination).

**Hardy-Weinberg equilibrium**

A state attained by a population which displays
constant allele and genotype frequencies from generation to generation. In the case of a locus with two alleles, and , occurring
at frequencies and , respectively, the frequency of genotype is , the frequency of is and the frequency of is . A population
in HW equilibrium normally has to be large and random-mating with no selection, mutation or migration.

**heritability**

A measure of the degree to which the variance of the distribution of a phenotype is due to genetic causes. Specifically,
heritability is defined as the proportion of phenotypic variance explained by the analyzed marker.

In PBAT, a negative
sign for a heritability indicates that the specified allele is under transmitted in the test statistic and a positive sign
indicates that the specified allele is over transmitted in the test statistic. The sign is estimated using the between family
information.

**intron**

The portion of a gene that is removed from translated mRNA prior to transcription.

**indexed
coordinates**

Coordinates are one-based, and the width of an interval is one plus the difference between the stop
and start positions. An interval covering the first three positions of a chromosome in an indexed system would be specified
as [1,3].

**linkage**

Two genes or markers that are so close together on a chromosome that they
are rarely separated by recombination are said to be linked.

**linkage analysis**

A statistical method
for detecting linkage between a disease allele and markers of known location by following their inheritance in families. linkage
disequilibrium

Linkage disequilibrium (LD) is the condition in which the haplotype frequencies in a population deviate
from the values they would have if the genes ate each locus were combined at random. LD between two loci often indicates that
they are physically close to each other on a DNA strand.

**marker gene**

A detectable genetic trait
or segment of DNA that can be identified and tracked. A marker gene can serve as a flag for another gene, sometimes called
the target gene. A marker gene must be on the same chromosome as the target gene and near enough to it so that the two genes
(the marker gene and the target gene) are genetically linked and are usually inherited together.

**minor allele
frequency (MAF)**

The frequency of the SNP's less frequent allele in a given population.

**Monte-Carlo
simulation**

Statistical, mathematical or graphical technique which considers multiple variables simultaneously.

**nucleoside**

A nucleobase (nitrogenous base), a five-carbon sugar (either ribose or 2'-deoxyribose). Binds with phosphate groups to form
a nucleotide.

**null hypothesis**

This is usually a statement of "no effect", that is to
say that the independent variable will not have any effect on the dependent variable and that any differences between the
experimental and control groups are attributable to chance. The null hypothesis is usually represented by the symbol and
is stated in order that it can be rejected as an explanation for the results of the experiment. For example, in a clinical
trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than a placebo. We would write
: there is no difference between the current drug and a placebo on average.

**odds**

A ratio of number
of people incurring an event to the number of people who have non-events.

**odds ratio**

The odds
ratio is a way of comparing whether the probability of a certain event is the same for two groups. An odds ratio of 1 implies
that the event is equally likely in both groups. An odds ratio greater than 1 implies that the event is more likely in the
first group.

For instance, the odds ratio may describe the odds of an experimental patient suffering an adverse event
relative to a control patient. Or, it may describe the ratio of the odds of having the target disorder in the experimental
group relative to the odds in favor of having the target disorder in the control group. Or, it may describe the odds in favor
of being exposed in subjects with the target disorder divided by the odds in favor of being exposed in control subjects (without
the target disorder).

**p-value**

A measure of how much evidence there is against the null hypothesis.
The smaller the p-value, the more evidence exists against . Traditionally, researchers will reject the null hypothesis if
the p-value is less than 0.05. A small p-value is evidence against the null hypothesis while a large p-value means little
or no evidence against the null hypothesis.

**pedigree files**

Pedigree files contain information
about family relationships, gender and genetic data.

With minor variations, the pre-MAKEPED format for the LINKAGE program
is the de-facto standard for pedigree files. This format contains fields for pedigree number, individual ID, father's ID,
mother's ID, sex, disease status, and the first and second alleles of each of the markers.

**phenotype files**

Phenotype files contain information about the individual phenotype values such as height, weight, body mass index (BMI),
whether the individual has the disease being studied, severity, etc.

There are many different formats for phenotype
files. However, they typically identify the pedigree ID and individual ID so that phenotype and pedigree information may be
matched.

**power (statistical)**

Statistical power is the probability you will detect a meaningful
difference, or effect, given that a true difference exists. Ideally, studies should have power levels of 0.80 or higher, an
80% chance or greater of finding an effect if one was really there.

Alternative definition 1: A gauge of the sensitivity
of a statistical test, that is, its ability to detect relationships. Specifically, the probability of correctly rejecting
a null hypothesis. In general, the statistical power increases with your sample size. Also called the "Power" of
a test.

Alternative definition 2: The power of a statistical test is the probability that the test will reject a false
null hypothesis, or in other words that it will not make a Type II error. The higher the power, the greater the chance of
obtaining a statistically significant result when the null hypothesis is false.predictor variables

A term synonymous
with covariate, explanatory variable and independent variable. Variables or factors that are assumed to have an effect or
influence on the selected phenotypes. E.g. height, weight, sex, age. However, they are not necessarily the variables of primary
interest. (See covariate)

**prevalence**

Prevalence is the total number of cases of a disease in a
given population at a specific time, or the percentage of population estimated to have that particular disease. "Population",
as used as a denominator, is generally the projected population calculated from the given model.

**proband**

The family member through whom a family's medical history comes to light. For example, a proband might be a baby with Down
syndrome. The proband may also be called the index case, propositus (if male), or proposita (if female).

**significance
level**

The significance level of a test is the probability that the test statistic will reject the null hypothesis
when the null hypothesis is true.

**simulation**

The use of a mathematical model to recreate a situation,
often repeatedly, so that the likelihood of various outcomes can be more accurately estimated.SNP analysis

Single nucleotide
polymorphisms (SNPs) are DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence
is changed. This occurs approximately once every 100 to 300 bases. There are many techniques for SNP detection and genotyping,
such as restriction fragment length polymorphism PCR (RFLP-PCR), SSCP, allele specific hybridization, primer extension, allele
specific oligonucleotide ligation, and sequencing.

**telomere**

The ends of a chromatid, composed
of repetitive DNA that protects the bulk of the information contained in the chromatid during replication.

**test
statistic**

A test statistic is a quantity calculated from a sample of data. Its value is used to decide whether
or not the null hypothesis should be rejected in a hypothesis test. The choice of a test statistic will depend on the assumed
probability model and the hypothesis under question.

**transcription**

Process by which messenger
RNA is created from DNA.

**translation**

Process by which messenger RNA is decoded into a specific
amino acid chain.