Visnyk of the Lviv University. Series Physics
58 (2021) p. 72-84
DOI: https://doi.org/10.30970/vph.58.2021.72
Parametrization of rank-frequency distributions of nucleotide sequences in virus RNAs
M. Husev, A. Rovenchak
| |
The paper analyzes the parameterization of rank–frequency distributions and frequency spectra of nucleotide sequences in viral RNA defined in a special way.
To achieve homogeneity of data, we analyze genomes of 52 single-stranded RNA viruses only.
Such sequences are obtained by replacing the most frequent nucleotide in the genome with a separator (“space”). This can be illustrated as follows. The sequence aacctaccactcaccctagcattacttatatgatatgtctccatacccattacaatc, with adenine (a) being the most frequent nucleotide, becomes x cct cc ctc ccct gc tt ctt t tg t tgtctcc t ccc tt c x tc. Note that a zero-length sequence named x is inserted between two consecutive adenines.
Based on the rank-frequency distributions, the following parameters are calculated: entropy, mean sequence length, sequence length variance, variance coefficient.
The frequency spectra Nj are defined as the number of different sequences within a particular genome having absolute frequency exactly equal to j. They are fitted by a modified Bose distribution, in which the ordinary exponential is replaced with the Kaniadakis kappa-exponential
expϰ (x) =
(√ 1 + ϰ²x² + ϰx)1/ϰ.
Such an approach allows to more accurately reproduce the observed “tails” of the distributions.
The calculated values of the fitting parameters, ϰ and T, demonstrate moderate grouping of related species.
We also observed that there exists a high positive correlation between T and the genome length N (measured as the number of
defined nucleotide sequences). The fitting with the function T = aNb over all the 52 genomes yielded
b = 0.92 ± 0.05, so another classification parameter, t = TN–b was suggested.
We expect that the proposed parameters may be useful for future studies, for example, by correlating their values with other characteristics of viruses.
Full text (pdf)
References
- W. Sung, Statistical Physics for Biological Matter (Springer, 2018). doi:10.1007/978-94-024-1584-1.
- A. Kuzmak, Sh. Carmali, E. von Lieres, A. J. Russell, S. Kondrat, Sci. Rep. 9, 455 (2019). doi:10.1038/s41598-018-37034-3.
- V. Abetz, K. Kremer, M. Müller, G. Reiter, Macromol. Chem. Phys. 220, 1800334 (2019). doi:10.1002/macp.201800334.
- Yu. Honchar, C. von Ferber, Yu. Holovatch, Physica A 573, 125917 (2021). doi:10.1016/j.physa.2021.125917
- Analysis of Complex Networks: From Biology to Linguistics, edited by M. Dehmer, F. Emmert-Streib (Weinheim: Wiley, 2009).
- H. Qian, Quant. Biol. 1, 50–53 (2013). doi:10.1007/s40484-013-0002-6.
- M. Babič, J. Mihelič., M. Calì, Appl. Sci. 10, 3037 (2020). doi:10.3390/app10093037.
- A. Rovenchak, Mod. Phys. Lett. B 32, 1850057 (2018). doi:10.1142/S0217984918500574
- M. Husev, A. Rovenchak, On the verge of life: Distribution of nucleotide sequences in viral RNAs. Biosemiotics, in press (2021). doi:10.1007/s12304-021-09403-5.
- V. Brendel, J. S. Beckmann, E. N. Trifonov, J. Biomol. Struct. Dyn. 4, 11–21 (1986). doi:10.1080/07391102.1986.10507643.
- D. Botstein, J. M. Cherry, Proc. Natl Acad. Sci. 94, 5506–5507 (1997). doi:10.1073/pnas.94.11.5506.
- R. Ferrer-i-Cancho, A. Hernández-Fernández, J. Baixeries, Ł. Dębowski, J. Mačutek, Complexity 13, 633-644 (2014). doi:10.1002/cplx.21429.
- D. Faltýnek D., V. Matlach, Ľ. Lacková, Biosemiotics 12, 289–304 (2019). doi:10.1007/s12304-019-09353-z.
- S. Ji, in Theoretical Information Studies: Information in the World, edited by M. Burgin, G. Dodig-Crnkovic (Singapore: World Scientific, 2020), p. 187–231. doi:10.1142/9789813277496_0010.
- Á. Corral, F. Font-Clos, Phys. Rev. E 96, 022318 (2017). doi:10.1103/PhysRevE.96.022318.
- X. Yan, S.-G. Yang, B. J. Kim, P. Minnhagen, Physica A 512, 305–315 (2018). doi:10.1016/j.physa.2018.08.133.
- A. Mazzolini, J. Grilli, E. De Lazzari, M. Osella, M. Cosentino Lagomarsino, M. Gherardi, Phys. Rev. E 98, 012315 (2018). doi:10.1103/PhysRevE.98.012315.
- E. DeGiuli, Phys. Rev. Lett. 122, 128301 (2019). doi:10.1103/PhysRevLett.122.128301.
- M. Husev, A Rovenchak, in International Conference of Students and Young Researchers in Theoretical and Experimental Physics “HEUREKA-2021”: abstracts (Lviv, 18–19 May 2021), p. H2.
- E. Kelih, G. Antić, P. Grzybek, E. Stadlober, in Classification – The Ubiquitous Challenge, edited by C. Weihs, W. Gaul (Heidelberg: Springer, 2005), p. 498–505. doi:10.1007/3-540-28084-7_58.
- S. Buk, O. Humenchyk, L. Mal'tseva, A. Rovenchak, in Text and Language: Structures - Functions - Interrelations. Quantitative perspectives, edited by P. Grzybek, E. Kelih, J. Mačutek (Wien: Praesens, 2010), p. 13–19.
- B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of the Cell, 6th edn. (New York: Garland Science, 2015).
- J. T. Patton, Segmented Double-stranded RNA Viruses: Structure and Molecular Biology (Caister Academic Press, 2008).
- G. Kaniadakis, Physica A 296, 405–425 (2001). doi:10.1016/S0378-4371(01)00184-4.
- G. Kaniadakis, Entropy 15, 3983 (2013). doi:10.3390/e15103983.
- A. V. Kolesnichenko, Math. Montisnigri 48, 118–144 (2020). doi:10.20948/mathmontis-2020-48-10.
- N. M. Bouvier, P. Palese, Vaccine 26[S4], D49–D53 (2008). doi:10.1016/j.vaccine.2008.07.039.
- A. Rovenchak, S. Buk, Physica A 390, 1326-1331 (2011). doi:10.1016/j.physa.2010.12.009.