Publication

For more information on PreTIS, please refer to our publication.

Reuter, K., Biehl, A., Koch, L., Helms, V. (2016) PreTIS: a tool to predict non-canonical 5' UTR translational initiation sites in human and mouse. PLOS Computational Biology, vol. 12, no. e1005170, doi: 10.1371/journal.pcbi.1005170.

Input parameters

We provided an example input to demonstrate the functionality of the web service. All parameters are set to reasonable values and the user can directly submit the job. Just visit the New job page, click on the "Load Example" button and then "Submit" your Job.

The 5' UTR of the provided human mRNA sequence is searched for putative start codons. Next, the translation initiation confidence of all detected start sites is predicted by the embedded statistical model.
By giving a valid Ensembl gene ID, 5' UTR and CDS sequences are retrieved automatically. Sequences can also be pasted or uploaded. Note that the mRNA sequence files must only contain the mRNA sequence itself and no additional >header information. To enable that the 5' UTR is scanned for putative start codons, the user must provide the 5' UTR and CDS separately.

Homologous mouse mRNA sequences are needed to calculate some of the required sequence features. If "automatical BLASTing" is checked (default), the best matching mouse ortholog is searched via BLAST. Found orthologous sequences can be inspected on the Results page after submitting a job. If "automatical BLASTing" is not checked, the murine 5' UTR and CDS sequences must be pasted or uploaded by the user.

The user can restrict putative start sites, PreTIS should search for. Default are all nine alternative start sites (one substitution with respect to AUG) and the AUG codon itself.

Feature set

Features are calculated from mRNA sequence information. The sequence window used here ranges from -99 to +99 around a putative start site. In-frame refers to the main reading frame. Feature values for all detected start sites can be downloaded as .csv file on the Results page after submitting a job.

Feature name Description Feature abbreviation (.csv table)
1. 5' UTR length Length of the 5' UTR. len_5utr
2. 5' UTR conservation Sequence conservation of the 5' UTR of human and mouse orthologous sequences. perc_conserv_5utr_aa_TO_nts
3. PWM positive PWMscore of a putative start site based on the PWMpositive, which was calculated using true start sites. PWM_pos
4. K-mer: upstream AUG Found by the k-mer search. Number of upstream AUGs. upstream_ATG
5. 5' UTR: percentage A Percentage of Adenine in the 5' UTR. utr5_A_perc
6. Kozak sequence context Kozak sequence context is discretized into strong (A or G at -3 and G at +4), intermediate (A or G at -3 and no G at +4), weak (no A and no G at -3 and G at +4) and no Kozak context. These categories are presented as the values 1 (no), 2 (weak), 3 (intermediate) and 4 (strong). all_kozak
7. Translational efficiency of flanking sequence Raw translation efficiency values reported by Noderer et al. (2014). efficiency_flanking_seq_context
8. K-mer: position -12 is C Found by the k-mer search. Binary feature reporting if Cytosine is present at position -12 (relative to the canonical AUG start). position_minus12_C
9. K-mer: upstream Asparagine Found by the k-mer search. Number of upstream Asparagines. upstream_AA_N
10. K-mer: downstream AUG Found by the k-mer search. Number of downstream AUGs. downstream_ATG
11. K-mer: upstream Adenine Found by the k-mer search. Number of upstream Adenines. upstream_A
12. K-mer: in-frame upstream Alanine Found by the k-mer search. Number of in-frame and upstream Alanines. inframe_upstream_AA_A
13. K-mer: upstream Alanine Found by the k-mer search. Number of upstream Alanines. upstream_AA_A
14. 5' UTR: percentage G Percentage of Guanine in the 5' UTR. utr5_G_perc
15. Codon conservation Start site conservation between human and mouse. codon_conserv_aa_TO_nts
16. K-mer: position -3 is A Found by the k-mer search. Binary feature reporting if Adenine is present at position -3 (relative to the canonical AUG start). position_minus3_A
17. K-mer: upstream CCG Found by the k-mer search. Number of upstream CCGs. upstream_CCG
18. K-mer: downstream CCA Found by the k-mer search. Number of downstream CCAs. downstream_CCA
19. K-mer: position -12 is A Found by the k-mer search. Binary feature reporting if Adenine is present at position -12 (relative to the canonical AUG start). position_minus12_A
20. K-mer: in-frame upstream Methionine Found by the k-mer search. Number of in-frame and upstream Methionines. inframe_upstream_AA_M
21. K-mer: upstream Arginine Found by the k-mer search. Number of upstream Arginines. upstream_AA_R
22. K-mer: upstream Histidine Found by the k-mer search. Number of upstream Histidindes. upstream_AA_H
23. K-mer: GCC Found by the k-mer search. Number of GCC in the predefined sequence window. k_gram_GCC
24. K-mer: position 4 is G Found by the k-mer search. Binary feature reporting if Guanine is present at position 4 (relative to the canonical AUG start). position4_G
25. K-mer: upstream Threonine Found by the k-mer search. Number of upstream Threonines. upstream_AA_T
26. K-mer: upstream CGG Found by the k-mer search. Number of upstream CGGs. upstream_CGG
27. K-mer: upstream C Found by the k-mer search. Number of upstream Cytosines. upstream_C
28. K-mer: position -2 is G Found by the k-mer search. Binary feature reporting if Guanine is present at position -2 (relative to the canonical AUG start). position_minus2_G
29. K-mer: upstream Stop Found by the k-mer search. Number of upstream Stop codons. upstream_AA_X
30. K-mer: UAG Found by the k-mer search. Number of UAG in the predefined sequence window. k_gram_TAG
31. K-mer: upstream CAU Found by the k-mer search. Number of upstream CAUs. upstream_CAT
32. K-mer: upstream Serine Found by the k-mer search. Number of upstream Serines. upstream_AA_S
33. K-mer: downstream Glutamine Found by the k-mer search. Number of downstream Glutamines. downstream_AA_Q
34. K-mer: AGG Found by the k-mer search. Number of AGGs in the predefined sequence window. k_gram_AGG
35. K-mer: AGC Found by the k-mer search. Number of AGCs in the predefined sequence window. k_gram_AGC
36. K-mer: downstream ACC Found by the k-mer search. Number of downstream ACCs. downstream_ACC
37. K-mer: UAA Found by the k-mer search. Number of UAAs in the predefined sequence window. k_gram_TAA
38. K-mer: downstream Proline Found by the k-mer search. Number of downstream Prolines. downstream_AA_P
39. K-mer: upstream CAA Found by the k-mer search. Number of upstream CAAs. upstream_CAA
40. K-mer: in-frame upstream Histidine Found by the k-mer search. Number of upstream Histidines. inframe_upstream_AA_H
41. K-mer: upstream GAU Found by the k-mer search. Number of upstream GAUs. upstream_GAT
42. K-mer: in-frame upstream GCC Found by the k-mer search. Number of upstream GCCs. inframe_upstream_GCC
43. K-mer: in-frame upstream GCG Found by the k-mer search. Number of upstream GCGs. inframe_upstream_GCG
44. PWM negative PWMscore of a putative start site based on the PWMnegative, which was calculated using false start sites. PWM_neg

1. Noderer, W. L., Flockhart, R. J., Bhaduri, A., Diaz de Arce, A. J., Zhang, J., Khavari, P. A., and Wang, C. L.
(2014) Quantitative analysis of mammalian translation initiation sites by FACS-seq.
Mol. Syst. Biol., 10, 748.