MALIGN-Plugin for MB DNA Analysis

Version: 0.9


1.) Setting up the alignment

1.1. Data
Select the sequences to analyze. The sequences HAVE TO be in the "database" directory of the main application. They have to have valid extensions: .prt for DNA files and .ami for amino acids sequences. For more information on the format of the sequences please read the help file which is supplied with the main application.
To select multiple sequence hold "CTRL" key on your keyboard

1.2. Options
  • Matrix

  • Choose an appropriate substitution matrix for your alignment.
    Here are some suggestion:
    - Use BLOSUM-62 to detect most weak protein similarities.
    - BLOSUM-45 for particularly long and weak alignments.
    - If aligning sequences with a small but very similar parts (10-15 AA): set gap penalty to 7-10 and gap opening penalty to 0-1
    - Also:
     Query Length  |Matrix
     ========================
     <35           |PAM-250
     35-50         |PAM-70
     50-85         |BLOSUM-60
     85            |BLOSUM-62
     
    - For BLOSUM62 Matrix following data can be applied:
     Gap Penalty   |Introducing Gap
     ==============================
     2             |7
     2             |8
     2             |9
     1             |10
     1             |12
     3             |20
     
    The matrices are in the "plugins" directory. Amino acids substitution matrices have extensions ".m". DNA substitution matrix has an extension ".md".

    The order of the amino acids in the matrix is:
    A C D E F G H I K L M N P Q R S T V W Y
    The order of nucleotides in the matrix is:
    a g c t

  • Alignment

  • Gap penalty is a number >=0.
    Penalty for introduction of gaps (gap opening penalty) is a number >=0. Set it to 0 when analyzing very distant sequences. Higher numbers are appropriate for very similar sequences.
    Please ensure to check different values for gap penalties to find the best alignment
    If a secondary structure of a peptide is known then it is possible
    to align the sequences using this information
    The secondary structures have to be saved in a text file with ".3d" extension.
    This file should be in DATABASE directory of the program
    It must have the following format:
    P [name of protein, identical with the file name, e.g. "P test" for "test.prt" peptide]
    HELIX 3 5
    TURN 20 23
    STRAND 7 15
    
    P test2
    TURN 15 17
    HELIX 1 14
    
    The structure annotation should follow directly after the "P" (for "protein") identifier
    Only three structures are available: HELIX, TURN, STRAND.
    The numbers represent the beginning/end of the structure
    Note: Use SPACE " " for the separation of structure names and position numbers
    Secondary structures of proteins can be obtained in the right format
    (you will only have to copy/paste it into the .3d file) from SWISSPROT
    The "penalty for a mismatch" is given if 2 residues have different structures (set it to 3-5).

  • Output

  • Select from HTML report and a plain text version.
    The results will be saved to "REPORTS" directory. Existing files will be overwritten.

    1.3. Save/Load Alignment Data
    You can save the list of sequences and alignment settings to a file.
    When you start the alignment the next time, you can load them again.
    The file extension is ".alg"

    2.) The alignment window

    The alignment is saved to the HTML file and then displayed in the results window.
    If making an alignment with secondary structure mismatch:
    H: all residues have Helix
    S: all residues have Strand
    T: all residues have Turn
    Identical residues are marked with red color. Residues with the same
    biochemical properties are marked with yellow color.
    MB divides amino acids into 3 groups:
     1.) No charge (nonpolar and uncharged polar side chains):
         Gly, Ala, Val, Leu, Ile, Met, Pro, Phe, Trp, Ser, Thr, 
         Asn, Gln, Tyr, Cys
     2.) Posively charged chains:
         Lys, Arg, His
     3.) Negatively charged chains:
         Asp, Glu 
    
    Hint: You can select the data in HTML report and then paste it into
    a word document for further editing.

    3.) The phylogenetic tree window

    The distance is the amount of possible mutations.
    It is calculated from the number of divergent characters in the sequences

    MB Homepage