MALIGN-Plugin for MB DNA AnalysisVersion: 1.3 |
1.) Setting up the alignment1.1. DataSelect the sequences to analyze. The sequences HAVE TO be in the "database" directory of the main application. They have to have valid extensions: .prt for DNA files and .ami for amino acid sequences. For more information on the format of the sequences please read the help file which is supplied with the main application. To select multiple sequence hold "CTRL" key on your keyboard 1.2. Options Choose an appropriate substitution matrix for your alignment. Here are some suggestion: - Use BLOSUM-62 to detect most weak protein similarities. - BLOSUM-45 for particularly long and weak alignments. - If aligning sequences with a small but very similar parts (10-15 AA): set gap penalty to 7-10 and gap opening penalty to 0-1 - Also: Query Length |Matrix ======================== <35 |PAM-250 35-50 |PAM-70 50-85 |BLOSUM-60 85 |BLOSUM-62- For BLOSUM62 Matrix following data can be applied: Gap Penalty |Introducing Gap ============================== 2 |7 2 |8 2 |9 1 |10 1 |12 3 |20Default is gap penalty = 2, gap introduction penalty = 5 The matrices are in the "plugins" directory. Amino acid substitution matrices have extensions ".m". DNA substitution matrix has an extension ".md". The order of the amino acid in the matrix is: A C D E F G H I K L M N P Q R S T V W Y The order of nucleotides in the matrix is: a g c t Gap penalty is a number >=0. Penalty for introduction of gaps (gap opening penalty) is a number >=0. Set it to 0 when analyzing very distant sequences. Higher numbers are appropriate for very similar sequences. Please ensure to check different values for gap penalties to find the best alignment If a secondary structure of a peptide is known then it is possible to align the sequences using this information The secondary structures have to be saved in a text file with ".3d" extension. This file should be in DATABASE directory of the program It must have the following format:
Only three structures are available: HELIX, TURN, STRAND. The numbers represent the beginning/end of the structure Note: Use SPACE " " for the separation of structure names and position numbers Secondary structures of proteins can be obtained in the right format (you will only have to copy/paste it into the .3d file) from SWISSPROT The "penalty for a mismatch" is given if 2 residues have different structures (set it to 3-5). Check "Align in certain order" to enable the ordered clustering The program alignes each of the sequence groups specefied on each of the lines separately Example:
The program stores the cluster of aligned sequences into the position of the sequence which is found first in the input file, meaning that when specifying:
input file corresponds to the sequences indexes given here). You can also prohibit certain single sequences from clustering together. However, if any of these sequence is present in a cluster the program will align it. Activating "Force to cluster during the first round" will make the program to cluster all of the sequences specified in the "prohibited to cluster together" field first during the clustering. Only when all of these sequences have been clustered, the program will proceed further down the guide tree to cluster the rest of sequences/produce full alignment. Activating this option will force the program to recalculate distance matrix after each clustering step, updating the guide tree. Select whether you want to have a report in HTML format. The results will be saved to "REPORTS" directory. Existing files will be overwritten. 1.3. Save/Load Alignment Data You can save the list of sequences and alignment settings to a file. When you start the alignment the next time, you can load them again. The file extension is ".alg" 2.) The alignment windowThe alignment is saved to the HTML file and then displayed in the results window.If making an alignment with secondary structure mismatch: H: all residues have Helix S: all residues have Strand T: all residues have Turn Identical residues are marked with red color. Residues with the same biochemical properties are marked with yellow color. MB divides amino acids into 3 groups: 1.) No charge (nonpolar and uncharged polar side chains): Gly, Ala, Val, Leu, Ile, Met, Pro, Phe, Trp, Ser, Thr, Asn, Gln, Tyr, Cys 2.) Posively charged chains: Lys, Arg, His 3.) Negatively charged chains: Asp, GluHint: You can select the data in HTML report and then paste it into a word document for further editing. 3.) The phylogenetic tree windowA simple neighbor-joining interpretation of the sequence data (guide tree).For the real phylogeny use appropriate programs. The distance is the amount of possible mutations. It is calculated from the number of divergent characters in the sequences |
MB Homepage |