logoRD.png
  • Article Galaxy Blog

July 18, 2019

FASTA Format: What Research Scientists Should Know

Posted by: Mitja-Alexander Linss

In bioinformatics and biochemistry—where collecting and analyzing complex biological data is a central focus—long character strings are often encoded in a format called FASTA.

In this post, we’ll provide a quick overview of the format and its uses.

 5 Quick Facts about FASTA format  

  • It is a text-based format used for representing nucleotide or protein/amino acid sequences.
  • FASTA format stores multiple sequence records.
  • It allows for sequence names and comments to precede the sequences. 
  • Each record in FASTA format begins with a single-line description (also called the 'header line' or 'definition line'), which includes the ">" symbol, followed by the sequence ID. The next line of a record includes the sequence data.
  • Base pairs are represented using single-letter codes as shown below:

Nucleic Acid Codes:
A = adenosine          
C = cytidine 
G = guanine 
T = thymidine 
N = A/G/C/T (any)
U = uridine
R = G/A (purine) 
Y = T/C (pyrimidine) 
K = G/T (keto)  
M = A/C (amino)
S = G/C (strong)  
W = A/T (weak) 
B = G/T/C 
D = G/A/T  
H = A/C/T   
V = G/C/A  

Accepted Amino Acid Codes:
A = alanine
B = aspartate or asparagine
C = cystine
D = aspartate
E = glutamate
F = phenylalanine
G = glycine 
H = histidine
I = isoleucine
K = lysine 
L = leucine
M = methionine
N = asparagine 
P = proline
Q = glutamine
R = arginine
S = serine
T = threonine
U = selenocysteine
V = valine
W = tryptophan
Y = tyrosine
Z = glutamate or glutamine
X = any

A single dash or hyphen (-) can be used to represent a gap of indeterminate length and an asterisk (*) can be used to represent a translation stop.

Here are three examples of how FASTA format looks:

>seq1
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM

>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK*

 >U06486.1 Human Wilms' tumor (WT1) gene, 5' region, partial cds CAGTGTCTTGTAGAATCTTCAGTGTCTTGATAATAATTTTAAAAGCTTCTGAGTGGAGACGACGCAAAGTCAAGCAGCAAAGGTGGCCTGGGAGGCAAGCGGAGGGCTCAAGTGCCGCATCTTTACCCTCAGGGTCTCCTGCGCCTACGGGATGCGCATTCCCAAGAAGTGCGCCCTTCGAGTAA

Putting FASTA Format to Use

One of the oldest recognized formats in bioinformatics, FASTA format is still widely used in sequence retrieval due to its simplicity and flexibility. Indeed, the format is considered an almost universal standard in the bioinformatics field of research.

In our ongoing effort to help make researchers' lives easier, the Gadgeteers here at Reprints Desk have incorporated FASTA Format into a number of our lab analysis and productivity apps or Gadgets, including: 

  • Plasmid Visualizer Paste or upload your DNA sequence data to view plasmid sequence annotation and plasmid visualization for circular sequences.
  • Peptide/Protein Calculator Upload peptide or protein FASTA-format sequence data to quickly calculate amino acid content, aromaticity, flexibility, instability index, isoelectric point, and more.
  • Sequence Annotator Upload nucleotide FASTA format files to quickly visualize and annotate linear DNA or RNA sequences.

With lab productivity Gadgets, you can save up to 50% of your research time by automating routine tasks – including those that involve FASTA format.

If you haven't done so already, we invite you to sign up for a free account and put these Gadgets to the test! We think you'll be pleased. And of course, we welcome any feedback you may have. 

Topics: research gadgets scientists scientist biomolecular sequence manager fasta peptide calculator protein calculator nucleotide sequence editor fasta format