Publication
For more information on PreTIS, please refer to our publication.
Reuter, K., Biehl, A., Koch, L., Helms, V. (2016) PreTIS: a tool to predict non-canonical 5' UTR translational initiation sites in human and mouse. PLOS Computational Biology, vol. 12, no. e1005170, doi: 10.1371/journal.pcbi.1005170.
Input parameters
We provided an example input to demonstrate the functionality of the web service. All parameters are set to reasonable values and the user can directly submit the job. Just visit the New job page, click on the "Load Example" button and then "Submit" your Job.
The 5' UTR of the provided human mRNA sequence is searched for putative start codons. Next, the translation initiation confidence of all detected start sites is predicted by the embedded statistical model.
By giving a valid Ensembl gene ID, 5' UTR and CDS sequences are retrieved automatically.
Sequences can also be pasted or uploaded. Note that the mRNA sequence files must only contain the mRNA sequence itself and no additional >header information.
To enable that the 5' UTR is scanned for putative start codons, the user must provide the 5' UTR and CDS separately.
Homologous mouse mRNA sequences are needed to calculate some of the required sequence features. If "automatical BLASTing" is checked (default), the best matching mouse ortholog is searched via BLAST. Found orthologous sequences can be inspected on the Results page after submitting a job. If "automatical BLASTing" is not checked, the murine 5' UTR and CDS sequences must be pasted or uploaded by the user.
The user can restrict putative start sites, PreTIS should search for. Default are all nine alternative start sites (one substitution with respect to AUG) and the AUG codon itself.
Feature set
Features are calculated from mRNA sequence information. The sequence window used here ranges from -99 to +99 around a putative start site. In-frame refers to the main reading frame. Feature values for all detected start sites can be downloaded as .csv file on the Results page after submitting a job.
Feature name | Description | Feature abbreviation (.csv table) | |
1. | 5' UTR length | Length of the 5' UTR. | len_5utr |
2. | 5' UTR conservation | Sequence conservation of the 5' UTR of human and mouse orthologous sequences. | perc_conserv_5utr_aa_TO_nts |
3. | PWM positive | PWMscore of a putative start site based on the PWMpositive, which was calculated using true start sites. | PWM_pos |
4. | K-mer: upstream AUG | Found by the k-mer search. Number of upstream AUGs. | upstream_ATG |
5. | 5' UTR: percentage A | Percentage of Adenine in the 5' UTR. | utr5_A_perc |
6. | Kozak sequence context | Kozak sequence context is discretized into strong (A or G at -3 and G at +4), intermediate (A or G at -3 and no G at +4), weak (no A and no G at -3 and G at +4) and no Kozak context. These categories are presented as the values 1 (no), 2 (weak), 3 (intermediate) and 4 (strong). | all_kozak |
7. | Translational efficiency of flanking sequence | Raw translation efficiency values reported by Noderer et al. (2014). | efficiency_flanking_seq_context |
8. | K-mer: position -12 is C | Found by the k-mer search. Binary feature reporting if Cytosine is present at position -12 (relative to the canonical AUG start). | position_minus12_C |
9. | K-mer: upstream Asparagine | Found by the k-mer search. Number of upstream Asparagines. | upstream_AA_N |
10. | K-mer: downstream AUG | Found by the k-mer search. Number of downstream AUGs. | downstream_ATG |
11. | K-mer: upstream Adenine | Found by the k-mer search. Number of upstream Adenines. | upstream_A |
12. | K-mer: in-frame upstream Alanine | Found by the k-mer search. Number of in-frame and upstream Alanines. | inframe_upstream_AA_A |
13. | K-mer: upstream Alanine | Found by the k-mer search. Number of upstream Alanines. | upstream_AA_A |
14. | 5' UTR: percentage G | Percentage of Guanine in the 5' UTR. | utr5_G_perc |
15. | Codon conservation | Start site conservation between human and mouse. | codon_conserv_aa_TO_nts |
16. | K-mer: position -3 is A | Found by the k-mer search. Binary feature reporting if Adenine is present at position -3 (relative to the canonical AUG start). | position_minus3_A |
17. | K-mer: upstream CCG | Found by the k-mer search. Number of upstream CCGs. | upstream_CCG |
18. | K-mer: downstream CCA | Found by the k-mer search. Number of downstream CCAs. | downstream_CCA |
19. | K-mer: position -12 is A | Found by the k-mer search. Binary feature reporting if Adenine is present at position -12 (relative to the canonical AUG start). | position_minus12_A |
20. | K-mer: in-frame upstream Methionine | Found by the k-mer search. Number of in-frame and upstream Methionines. | inframe_upstream_AA_M |
21. | K-mer: upstream Arginine | Found by the k-mer search. Number of upstream Arginines. | upstream_AA_R |
22. | K-mer: upstream Histidine | Found by the k-mer search. Number of upstream Histidindes. | upstream_AA_H |
23. | K-mer: GCC | Found by the k-mer search. Number of GCC in the predefined sequence window. | k_gram_GCC |
24. | K-mer: position 4 is G | Found by the k-mer search. Binary feature reporting if Guanine is present at position 4 (relative to the canonical AUG start). | position4_G |
25. | K-mer: upstream Threonine | Found by the k-mer search. Number of upstream Threonines. | upstream_AA_T |
26. | K-mer: upstream CGG | Found by the k-mer search. Number of upstream CGGs. | upstream_CGG |
27. | K-mer: upstream C | Found by the k-mer search. Number of upstream Cytosines. | upstream_C |
28. | K-mer: position -2 is G | Found by the k-mer search. Binary feature reporting if Guanine is present at position -2 (relative to the canonical AUG start). | position_minus2_G |
29. | K-mer: upstream Stop | Found by the k-mer search. Number of upstream Stop codons. | upstream_AA_X |
30. | K-mer: UAG | Found by the k-mer search. Number of UAG in the predefined sequence window. | k_gram_TAG |
31. | K-mer: upstream CAU | Found by the k-mer search. Number of upstream CAUs. | upstream_CAT |
32. | K-mer: upstream Serine | Found by the k-mer search. Number of upstream Serines. | upstream_AA_S |
33. | K-mer: downstream Glutamine | Found by the k-mer search. Number of downstream Glutamines. | downstream_AA_Q |
34. | K-mer: AGG | Found by the k-mer search. Number of AGGs in the predefined sequence window. | k_gram_AGG |
35. | K-mer: AGC | Found by the k-mer search. Number of AGCs in the predefined sequence window. | k_gram_AGC |
36. | K-mer: downstream ACC | Found by the k-mer search. Number of downstream ACCs. | downstream_ACC |
37. | K-mer: UAA | Found by the k-mer search. Number of UAAs in the predefined sequence window. | k_gram_TAA |
38. | K-mer: downstream Proline | Found by the k-mer search. Number of downstream Prolines. | downstream_AA_P |
39. | K-mer: upstream CAA | Found by the k-mer search. Number of upstream CAAs. | upstream_CAA |
40. | K-mer: in-frame upstream Histidine | Found by the k-mer search. Number of upstream Histidines. | inframe_upstream_AA_H |
41. | K-mer: upstream GAU | Found by the k-mer search. Number of upstream GAUs. | upstream_GAT |
42. | K-mer: in-frame upstream GCC | Found by the k-mer search. Number of upstream GCCs. | inframe_upstream_GCC |
43. | K-mer: in-frame upstream GCG | Found by the k-mer search. Number of upstream GCGs. | inframe_upstream_GCG |
44. | PWM negative | PWMscore of a putative start site based on the PWMnegative, which was calculated using false start sites. | PWM_neg |
(2014) Quantitative analysis of mammalian translation initiation sites by FACS-seq.
Mol. Syst. Biol., 10, 748.