MALIGN-Plugin for MB DNA Analysis

Version: 1.3


1.) Setting up the alignment

1.1. Data
Select the sequences to analyze. The sequences HAVE TO be in the "database" directory of the main application. They have to have valid extensions: .prt for DNA files and .ami for amino acid sequences. For more information on the format of the sequences please read the help file which is supplied with the main application.
To select multiple sequence hold "CTRL" key on your keyboard

1.2. Options
  • Matrix

  • Choose an appropriate substitution matrix for your alignment.
    Here are some suggestion:
    - Use BLOSUM-62 to detect most weak protein similarities.
    - BLOSUM-45 for particularly long and weak alignments.
    - If aligning sequences with a small but very similar parts (10-15 AA): set gap penalty to 7-10 and gap opening penalty to 0-1
    - Also:
     Query Length  |Matrix
     ========================
     <35           |PAM-250
     35-50         |PAM-70
     50-85         |BLOSUM-60
     85            |BLOSUM-62
     
    - For BLOSUM62 Matrix following data can be applied:
     Gap Penalty   |Introducing Gap
     ==============================
     2             |7
     2             |8
     2             |9
     1             |10
     1             |12
     3             |20
     
    Default is gap penalty = 2, gap introduction penalty = 5

    The matrices are in the "plugins" directory. Amino acid substitution matrices have extensions ".m". DNA substitution matrix has an extension ".md".

    The order of the amino acid in the matrix is:
    A C D E F G H I K L M N P Q R S T V W Y
    The order of nucleotides in the matrix is:
    a g c t

  • Alignment

  • Gap penalty is a number >=0.
    Penalty for introduction of gaps (gap opening penalty) is a number >=0. Set it to 0 when analyzing very distant sequences. Higher numbers are appropriate for very similar sequences.
    Please ensure to check different values for gap penalties to find the best alignment
    If a secondary structure of a peptide is known then it is possible
    to align the sequences using this information
    The secondary structures have to be saved in a text file with ".3d" extension.
    This file should be in DATABASE directory of the program
    It must have the following format:
    P [name of protein, identical with the file name, e.g. "P test" for "test.prt" peptide]
    HELIX 3 5
    TURN 20 23
    STRAND 7 15
    
    P test2
    TURN 15 17
    HELIX 1 14
    
    The structure annotation should follow directly after the "P" (for "protein") identifier
    Only three structures are available: HELIX, TURN, STRAND.
    The numbers represent the beginning/end of the structure
    Note: Use SPACE " " for the separation of structure names and position numbers
    Secondary structures of proteins can be obtained in the right format
    (you will only have to copy/paste it into the .3d file) from SWISSPROT
    The "penalty for a mismatch" is given if 2 residues have different structures (set it to 3-5).

  • Alignment Order

  • Check "Align in certain order" to enable the ordered clustering
    The program alignes each of the sequence groups specefied on each of the lines separately
    Example:
    Prot1 Prot2
    Prot3 Prot4
    
    Prot1 and Prot2 are aligned first, then Prot3 and Prot4 sequences. Further clustering is then done automatically.
    The program stores the cluster of aligned sequences into the position of the sequence which is found first in the input file,
    meaning that when specifying:
    Prot1 Prot2
    Prot3 Prot4
    Prot1 Prot3
    
    will make the program to align Prot1-Prot2 group against Prot3-Prot4 group (assuming that the order of sequences in the
    input file corresponds to the sequences indexes given here).

    You can also prohibit certain single sequences from clustering together. However, if any of these sequence is present in a cluster
    the program will align it.

    Activating "Force to cluster during the first round" will make the program to cluster all of the sequences specified
    in the "prohibited to cluster together" field first during the clustering. Only when all of these sequences have been
    clustered, the program will proceed further down the guide tree to cluster the rest of sequences/produce full alignment.

  • Optimization

  • Activating this option will force the program to recalculate distance matrix after each clustering step, updating the guide
    tree.

  • Output

  • Select whether you want to have a report in HTML format.
    The results will be saved to "REPORTS" directory. Existing files will be overwritten.

    1.3. Save/Load Alignment Data
    You can save the list of sequences and alignment settings to a file.
    When you start the alignment the next time, you can load them again.
    The file extension is ".alg"

    2.) The alignment window

    The alignment is saved to the HTML file and then displayed in the results window.
    If making an alignment with secondary structure mismatch:
    H: all residues have Helix
    S: all residues have Strand
    T: all residues have Turn
    Identical residues are marked with red color. Residues with the same
    biochemical properties are marked with yellow color.
    MB divides amino acids into 3 groups:
     1.) No charge (nonpolar and uncharged polar side chains):
         Gly, Ala, Val, Leu, Ile, Met, Pro, Phe, Trp, Ser, Thr, 
         Asn, Gln, Tyr, Cys
     2.) Posively charged chains:
         Lys, Arg, His
     3.) Negatively charged chains:
         Asp, Glu 
    
    Hint: You can select the data in HTML report and then paste it into
    a word document for further editing.

    3.) The phylogenetic tree window

    A simple neighbor-joining interpretation of the sequence data (guide tree).
    For the real phylogeny use appropriate programs.
    The distance is the amount of possible mutations. It is calculated from the number of divergent characters in the sequences

    MB Homepage