Visnyk of the Lviv University. Series Physics 58 (2021) p. 72-84

Parametrization of rank-frequency distributions of nucleotide sequences in virus RNAs

M. Husev, A. Rovenchak

The paper analyzes the parameterization of rank–frequency distributions and frequency spectra of nucleotide sequences in viral RNA defined in a special way. To achieve homogeneity of data, we analyze genomes of 52 single-stranded RNA viruses only. Such sequences are obtained by replacing the most frequent nucleotide in the genome with a separator (“space”). This can be illustrated as follows. The sequence aacctaccactcaccctagcattacttatatgatatgtctccatacccattacaatc, with adenine (a) being the most frequent nucleotide, becomes x cct cc ctc ccct gc tt ctt t tg t tgtctcc t ccc tt c x tc. Note that a zero-length sequence named x is inserted between two consecutive adenines. Based on the rank-frequency distributions, the following parameters are calculated: entropy, mean sequence length, sequence length variance, variance coefficient.
The frequency spectra Nj are defined as the number of different sequences within a particular genome having absolute frequency exactly equal to j. They are fitted by a modified Bose distribution, in which the ordinary exponential is replaced with the Kaniadakis kappa-exponential expϰ (x) = ( 1 + ϰ²x²  + ϰx)1/ϰ. Such an approach allows to more accurately reproduce the observed “tails” of the distributions. The calculated values of the fitting parameters, ϰ and T, demonstrate moderate grouping of related species. We also observed that there exists a high positive correlation between T and the genome length N (measured as the number of defined nucleotide sequences). The fitting with the function T = aNb over all the 52 genomes yielded b = 0.92 ± 0.05, so another classification parameter, t = TN–b was suggested.
We expect that the proposed parameters may be useful for future studies, for example, by correlating their values with other characteristics of viruses.

Full text (pdf)

  1. W. Sung, Statistical Physics for Biological Matter (Springer, 2018). doi:10.1007/978-94-024-1584-1.
  2. A. Kuzmak, Sh. Carmali, E. von Lieres, A. J. Russell, S. Kondrat, Sci. Rep. 9, 455 (2019). doi:10.1038/s41598-018-37034-3.
  3. V. Abetz, K. Kremer, M. Müller, G. Reiter, Macromol. Chem. Phys. 220, 1800334 (2019). doi:10.1002/macp.201800334.
  4. Yu. Honchar, C. von Ferber, Yu. Holovatch, Physica A 573, 125917 (2021). doi:10.1016/j.physa.2021.125917
  5. Analysis of Complex Networks: From Biology to Linguistics, edited by M. Dehmer, F. Emmert-Streib (Weinheim: Wiley, 2009).
  6. H. Qian, Quant. Biol. 1, 50–53 (2013). doi:10.1007/s40484-013-0002-6.
  7. M. Babič, J. Mihelič., M. Calì, Appl. Sci. 10, 3037 (2020). doi:10.3390/app10093037.
  8. A. Rovenchak, Mod. Phys. Lett. B 32, 1850057 (2018). doi:10.1142/S0217984918500574
  9. M. Husev, A. Rovenchak, On the verge of life: Distribution of nucleotide sequences in viral RNAs. Biosemiotics, in press (2021). doi:10.1007/s12304-021-09403-5.
  10. V. Brendel, J. S. Beckmann, E. N. Trifonov, J. Biomol. Struct. Dyn. 4, 11–21 (1986). doi:10.1080/07391102.1986.10507643.
  11. D. Botstein, J. M. Cherry, Proc. Natl Acad. Sci. 94, 5506–5507 (1997). doi:10.1073/pnas.94.11.5506.
  12. R. Ferrer-i-Cancho, A. Hernández-Fernández, J. Baixeries, Ł. Dębowski, J. Mačutek, Complexity 13, 633-644 (2014). doi:10.1002/cplx.21429.
  13. D. Faltýnek D., V. Matlach, Ľ. Lacková, Biosemiotics 12, 289–304 (2019). doi:10.1007/s12304-019-09353-z.
  14. S. Ji, in Theoretical Information Studies: Information in the World, edited by M. Burgin, G. Dodig-Crnkovic (Singapore: World Scientific, 2020), p. 187–231. doi:10.1142/9789813277496_0010.
  15. Á. Corral, F. Font-Clos, Phys. Rev. E 96, 022318 (2017). doi:10.1103/PhysRevE.96.022318.
  16. X. Yan, S.-G. Yang, B. J. Kim, P. Minnhagen, Physica A 512, 305–315 (2018). doi:10.1016/j.physa.2018.08.133.
  17. A. Mazzolini, J. Grilli, E. De Lazzari, M. Osella, M. Cosentino Lagomarsino, M. Gherardi, Phys. Rev. E 98, 012315 (2018). doi:10.1103/PhysRevE.98.012315.
  18. E. DeGiuli, Phys. Rev. Lett. 122, 128301 (2019). doi:10.1103/PhysRevLett.122.128301.
  19. M. Husev, A Rovenchak, in International Conference of Students and Young Researchers in Theoretical and Experimental Physics “HEUREKA-2021”: abstracts (Lviv, 18–19 May 2021), p. H2.
  20. E. Kelih, G. Antić, P. Grzybek, E. Stadlober, in Classification – The Ubiquitous Challenge, edited by C. Weihs, W. Gaul (Heidelberg: Springer, 2005), p. 498–505. doi:10.1007/3-540-28084-7_58.
  21. S. Buk, O. Humenchyk, L. Mal'tseva, A. Rovenchak, in Text and Language: Structures - Functions - Interrelations. Quantitative perspectives, edited by P. Grzybek, E. Kelih, J. Mačutek (Wien: Praesens, 2010), p. 13–19.
  22. B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of the Cell, 6th edn. (New York: Garland Science, 2015).
  23. J. T. Patton, Segmented Double-stranded RNA Viruses: Structure and Molecular Biology (Caister Academic Press, 2008).
  24. G. Kaniadakis, Physica A 296, 405–425 (2001). doi:10.1016/S0378-4371(01)00184-4.
  25. G. Kaniadakis, Entropy 15, 3983 (2013). doi:10.3390/e15103983.
  26. A. V. Kolesnichenko, Math. Montisnigri 48, 118–144 (2020). doi:10.20948/mathmontis-2020-48-10.
  27. N. M. Bouvier, P. Palese, Vaccine 26[S4], D49–D53 (2008). doi:10.1016/j.vaccine.2008.07.039.
  28. A. Rovenchak, S. Buk, Physica A 390, 1326-1331 (2011). doi:10.1016/j.physa.2010.12.009.