Multalin help page

Introduction

Welcome to Multalin!

This software will allow you to align simultaneously several biological sequences.

What is a Multiple sequence alignment? It is the arrangement of several protein or nucleic acid sequences with postulated gaps so that similar residues are juxtaposed. A positive score is attached to identities, conservative or non-conservative substitutions (the score amplitude measuring the similarity) and a penalty to gaps; an ideal program would maximise the total score, taking account of all possible alignments and allowing for any length gap at any position.

Unfortunately the computing requirements, both of time and memory, grow as the nth power, where n is the sequence number, so this ideal alignment can be found only for two sequences or three short sequences. In the general case, to be practicable programs must restrict the conditions of the optimisation. Nevertheless it is undeniably useful to have an automatic system available for multiple sequence alignment to provide a starting point for a more human analysis.

Multalin creates a multiple sequence alignment from a group of related sequences using progressive pairwise alignments. The method used is described in "Multiple sequence alignment with hierarchical clustering", F.Corpet, 1988, Nucl. Acids Res. 16 10881-10890.

How to use MultAlin

Warning : No computer skills are required to use MultAlin, only basic www knowledge !

On the MultAlin home page you will see a large rectangle. This is where you are going to paste (as in cut and paste) your sequences (try a sample set of sequences the first time). Instead of pasting your sequences, you can give the name of your sequences file, or select it with the Browse button.

The next step is to set the parameters. These are only of basic www difficulty but you will be able to find help by clicking on the associated question mark. Simply use the pop up menus or type in text or numbers where required. When you are ready click on the "submit data" button (you can use either the buttons at top or at bottom of the page .

Now you will have to wait for our server to calculate.(this can take up to a few hours for very large sequences).

The result will be sent back to your internet browser in the form of a GIF image (default), a plain text or a coloured html page. You will be able to change the colours, font size, line size etc. and even the consensus levels (see Presentation options for details).

The procedure is the same as for the MultAlin set-up, just use the pop up menus and type in text or numbers where required. When ready click on the "Apply Changes" button. The new image will appear shortly after. (only the image is changed, no realignment is done)

On your result page, you can add a sequence to the alignment. This sequence will be aligned with your already aligned sequences and you'll get a new result page, with the new sequence placed beside its more similar sequence. For this step, MultAlin performs an optimal alignment of the new sequence and the block of the already aligned sequences: the result can be different if you directly ask for an alignment of all the sequences in the first form.

Paste your new sequence in the rectangle aera in Fasta/Multalin format (i.e. one line with a beginning '>' for the sequence name, and other lines with rhe sequence itself). Click on the "Apply Changes" button when ready.

Input formats

MultAlin-Fasta - GenBank - EMBL-SwissProt - (click here for samples)

MultAlin-Fasta

The MultAlin format is similar to Fasta. Sequences can be interrupted by spaces or digits not taken into account (see samples in MultAlin and pure Fasta formats)

        > SeqName the sequence name is the
        > first word of the first comment line 
        > max: 8 letters 
        > comment lines begin with >
        AAAACCGTTAAA...
        > SeqNam2 the 2nd sequence beginning  
        > shows the end of the first one 
        AAACCTGGAC...

GenBank

        LOCUS      SeqName  
        any lines  
        ORIGIN     anything              
        1 aggtcccttt tgtgttgttt

The sequence name is the first word after the LOCUS key-word. The sequence begins on the line following the ORIGIN key-word. The next sequence information begins with the LOCUS key-word. See sample.

EMBL-SwissProt

        ID   SeqName  
        any lines 
        SQ   anything  
        aauccagug gagaucaaag          
        any sequence lines  
        //

The sequence name is the first word after the ID key-word. The sequence begins on the line following the SQ key-word. The next sequence information begins on the line following // See sample.

Output format

The sequence alignment will be displayed as:

a coloured image
a GIF image is loaded as any image. Click the image button if you have not selected the "automatically load images". The GIF image that you will see is configurable. You can change the colours of comment text, font size, background colour, high and low consensus colours and the neutral colour.
a plain text
it is the fastest way if you have problem loading images or large html pages.
a coloured html text
this html page uses a style sheet, so you must select the "Enable style sheets" option of your browser. The Html page that you will see is configurable. You can change background colour, high and low consensus colours and the neutral colour. To change the font size, use your browser Preferences.

In any case you can adjust the consensus levels.

Available files

Just underneath you will be able to see the input sequence file, the cluster file, the alignment in fasta or msf format plain text, the alignment in msf format with colour indications as a coded text, an html text or a gif image.

Any of these files can be saved to your local disk, simply using your WWW browser. The plain texts can be viewed, edited or printed with any text editor, the Html page and the GIF image, with your browser or a text processor that allows these formats.

To translate the colour indications of the coded text to true colours, you can use Microsoft Word and the MultAlin macro (FTP multalin.dot and save to disk even if you see odd characters on your browser) as follow:

Open your .doc file with Microsoft Word (File/Open)
Change the templates (File/Models... or Tools/Models..., Link..., search the disk to 
 select multalin.dot, Open)
Run MultAlin Macro (Tools/Macro..., select MultAlin, Run)

You can also add MultAlin macro to your current model (Normal.dot):
Tools/Macro..., Organizer, Close File then Open File (on the same 
 button), search the disk to select multalin.dot, Open, select MultAlin,
Copy >> into Normal.dot, Close

Other parameters

Symbol comparison table - Gap penalties - Gap penalty at extremities - One iteration only

Symbol comparison table

Blosum62 symbol comparison table
S. Henikoff and J.G. Henikoff, Amino acid substitution matrices from protein blocks, 1992, P.N.A.S. USA 89, 10915-10919. This table is the original Blosum62 with a value of 4 added to each entry for it to be non-negative.
Dayhoff symbol comparison table
M.O. Dayfoff, R.M. Schwartz and B.C. Orcutt, Atlas of Protein and Sequence Structure , Ed M.O. Dayhoff, National Biomedical Research Foundation (Washington D.C. 1979). This table is Dayhoff's PAM250 with a value of 8 added to each entry for it to be non-negative.
Genetic symbol comparison table
Each value is the maximum number of common bases in the corresponding amino acid codon.
Risler symbol comparison table
J.L. Risler, M.O Delorme, H. Delacroix, A.Henaut, Journal of Molecular Biology, 204, 1019, 1988.
DNA symbol comparison table
This table scores a match for any overlap between any IUB (International Union of Biochemits) nucleic acid ambiguity symbols, except X/N, as follows :
A or C = M; A or G = R; A or T = W; C or G = S; C or T = Y; G or T =K; A or C or G = V; A or C or T = H; A or G or T =D; C or G or T = B; A or C or G or T = X or N;
These codes are compatible with the codes used by the EMBL, GenBank and PIR data libraries and by the GCG package.
Alternate DNA symbol comparison table
This table scores :

8 for a match
6 for a match with two base ambiguity symbol
4 for a match with a three base ambiguity symbol
3 for a match with a four base ambiguity symbol

where the ambiguity symbols are :

A or C = M; A or G = R; A or T = W; C or G = S; C or T = Y; G or T =K; A or C or G = V; A or C or T = H; A or G or T =D; C or G or T = B; A or C or G or T = X or N;
These codes are compatible with the codes used by the EMBL, GenBank and PIR data libraries and by the GCG package.
Identity symbol comparison table
This table scores 1 for a match and 0 for a mismatch between any two letters.
Personal table
You can use your own comparison table by giving its file name, or selecting it with the Browse button. To write your own table, use the same format as the standard MultAlin tables (see Dayhoff symbol comparison table for format details). You can also select a comparison table from the GCG package: in this case the table file name must end with ".cmp" (e.g. pileupdna.cmp).

Gap penalties

This penalty is subtracted to the alignment score of 2 clusters each time a new gap is inserted in one cluster. This penalty is length dependent: it is the sum of "penalty at gap opening" and of "penalty at gap extension" times the gap length; both values must be non negative; their maximum value is 255.
The similarity score is equal to the sum of the values of the matches (each match scored with the scoring table) less the gap penalties. The gap penalty is charged for every internal gap. By default, no penalty is charged for terminal gaps.

An optimal alignment is one with the maximum possible score. It is sensitive to the symbol comparison values and to the gap penalties.

Gap penalty at extremities

By default no penalty is charged for terminal gap. The user can change that for particular alignments where terminal gaps must be considered as the internal ones. Choose "beginning" to charge a gap at the sequence beginning, "end" to charge one at the end and "both" to charge all terminal gaps.

One iteration only

With this option, final alignment can be obtained more quickly, but it may not be the best possible alignment.

Presentation options

Text options

For a coloured image
you can choose the text size, the text colour, the background colour and three colours for the sequence residues (high consensus, low consensus, neutral).
For a coloured html text
you can choose the background colour and three colours for the sequence residues (high consensus, low consensus, neutral). The text colour is automatically set to the neutral colour. The font size can be set with your WWW browser preferences.

Consensus options

You can choose the conservation thresholds for a position to be a high or low consensus position. A residue that is highly conserved appears in high-consensus colour and as an uppercase letter in the consensus line. A residue that is weakly conserved appears in low-consensus colour and as a lowercase letter in the consensus line . Other residues appears in neutral colour. A position with no conserved residue is represented by a dot in the consensus line.

Multalin help page

Table of contents :

More information (complete MultAlin documentation)

MultAlin-Fasta - GenBank - EMBL-SwissProt - (click here for samples)

a coloured image

a plain text

a coloured html text

Other parameters

Symbol comparison table - Gap penalties - Gap penalty at extremities - One iteration only

Personal table

For a coloured image

For a coloured html text