These experiments aim to annotate the genome of Drosophila melanogaster.
PCR targets are selected based on existing evidence for predicted gene models. The latest ESTs, mRNAs and experimental data (if not already in Genbank) are collapsed into clusters, aligned, processed into multi-exon predicted transcripts, then classified into 3 main categories: known, novel, and partially novel. These categories are defined as follows:
1. Known: All introns are overlapped by EST or mRNA sequence
2. Partial: Some, but not all, introns are overlapped by EST or mRNA sequence
3. Novel: No introns are overlapped by EST or mRNA sequence
These categories may be further sub-categorized depending on whether the predicted intron/exon structure is supported, whether the overlapping ESTs are spliced, etc. PCR primers are designed to amplify regions of partially and totally novel predictions which have unverified introns.
In some cases, protein homology with other species is used to prioritize novel predictions for testing. An initial test set of completely novel predictions may have 35-75% homology to proteins of other Drosophila species, for example.
PCR products are amplified by reverse transcription using Superscript III (Invitrogen) followed by standard PCR with Phusion polymerase, from pools of RNA from Cherbas et al. Products were cycle sequenced using both F and R pcr primers performed either in-house by the standard ABI Big Dye version 3.1 protocol or outsourced. The resulting EST is aligned to the dm3/Release 5 Drosophila genome using GMAP. An EST is considered a hit if it results in a spliced alignment with over 90 percent identity that aligns to the same locus as it was designed for, or if it aligns to a locus in the genome that is not covered by any transcript in the experiment. Positive EST sequences are submitted to genbank.