Bioinformatics Multiple Choice Questions on “Position – Specific Scoring Matrices”.
1. Analysis of s for conserved blocks of sequence leads to production of the position-specific scoring matrix. Answer: A 2. The quality and quantity of information provided by the PSSM also varies for ________ in the motif. Answer: B 3. Two considerations arise in trying to tune the PSSM so that it adequately represents the training sequences. Which of the following is not their description? Answer: B 4. If a good sampling of sequences is _______ the number of sequences is _________ and the motif structure is ________ it should, in principle, be possible to obtain frequencies highly representative of the same motif in other sequences also. Answer: A 5. If the data set is _______ then unless the motif has __________ amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif. Answer: B 6. Even if many pseudocounts are added in comparison to real sequence counts, the amino acid frequencies will not have any effect or influence. Answer: B 7. Which of the following is not a feature of editors and formatters? Answer: D 8. GDE (Genetic Data Environment) provides a general interface on UNIX machines for sequence analysis, sequence alignment editing, and display. Answer: A 9. MACAW is a local multiple sequence alignment program only. Answer: B 10. Two commonly encountered examples are the Genetics Computer Group’s MSF format and the CLUSTALW ALN format. Answer: A
A. True
B. False
Explanation: The analysis of MSAs (Multiple Sequence Alignment) for conserved blocks of sequence leads to production of the position-specific scoring matrix or PSSM. The PSSM may be used to search a sequence to obtain the most probable location or locations of the motif represented by the PSSM. Alternatively, the PSSM may be used to search an entire database to identify additional sequences that also have the same motif.
A. each row
B. each column
C. rows and columns
D. neither the rows nor the columns
Explanation: The quality and quantity of information provided by the PSSM also varies for each column in the motif, and this variation profoundly influences the matches found with sequences. This situation can be accurately described by information theory, and the results can be displayed by a colored graph called a sequence logo.
A. If a given column in 20 sequences has only isoleucine, it is not very likely that different amino acid will be found in other sequences with that motif because the residue is probably important for function
B. If a given column in 20 sequences has only isoleucine, it is very likely that different amino acid will be found in other sequences with that motif because the residue is probably important for function
C. If the number of sequences with the found motif is large and reasonably diverse, the sequences represent a good statistical sampling of all sequences that are ever likely to be found with that same motif
D. Another column in the motif from the 20 sequences may have several amino acids, and some amino acids may not be represented at all
Explanation: The PSSM is constructed by a simple logarithmic transformation of a matrix giving the frequency of each amino acid in the motif. Even more variation may be expected at that position in other sequences, although the more abundant amino acids already found in that column would probably be favored.
A. available, sufficiently large, not too complex
B. unavailable, sufficiently large, not too complex
C. unavailable, sufficiently small, not too complex
D. available, sufficiently large, too complex
Explanation: The more abundant amino acids already found in that column would probably be favored. Thus, if a good sampling of sequences is available, the number of sequences is sufficiently large, and the motif structure is not too complex, it should, in principle, be possible to obtain frequencies highly representative of the same motif in other sequences also (Henikoff and Henikoff 1996).
A. small, distinct
B. small, almost identical
C. large, almost identical
D. large, distinct
Explanation: The number of sequences for producing the motif may be small, highly diverse, or complex, giving rise to a second level of consideration. If the data set is small, then unless the motif has almost identical amino acids in each column, the column frequencies in the motif may not be highly representative of all other occurrences of the motif. In such cases, it is desirable to improve the estimates of the amino acid frequencies by adding extra amino acid counts, called pseudocounts, to obtain a more reasonable distribution of amino acid frequencies in the column.
A. True
B. False
Explanation: Knowing how many counts to add is a difficult but fortunately solvable problem. On the one hand, if too many pseudocounts are added in comparison to real sequence counts, the pseudocounts will become the dominant influence in the amino acid frequencies and searches using the motif will not work. On the other hand, if there are relatively few real counts, many amino acid variations may not be present because of the small sample of sequences.
A. provision for displaying the sequence on a color monitor with residue colors to aid in a clear visual representation of the alignment
B. recognition of the multiple sequence format that was output by the MSA (Multiple Sequence Alignment) program
C. maintenance of the alignment in a suitable format when the editing is completed
D. disallowing shading conserved residues in the alignment
Explanation: In addition to this, provision of a suitable windows interface, allowing use of the mouse to add, delete, or move sequence followed by an updated display of the alignment, is a feature. In addition, there are other types of editing that are commonly performed on MSAs (Multiple Sequence Alignment) program such as, for example, shading conserved residues in the alignment.
A. True
B. False
Explanation: It is available from several anonymous FTP sites. This interface requires communication with a host UNIX machine running the Genetics Computer Group software. Interface with MS-DOS or Macintosh is possible if the computer is equipped with the appropriate X-Windows client software.
A. True
B. False
Explanation: MACAW is both a local multiple sequence alignment program and a sequence editing tool. Given a set of sequences, the program finds ungapped blocks in the sequences and gives their statistical significance. Later versions of the program find blocks by one of three user-chosen methods.
A. True
B. False
Explanation: This is because these formats follow a precise outline, one may be readily converted to another by computer programs. READSEQ by D.G.Gilbert at Indiana University at Bloomington is one such program.