On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences

All known terrestrial proteins are coded as continuous strings of ≈20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Turjanski, P., Ferreiro, D.U.
Formato: JOUR
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_15206106_v122_n49_p11295_Turjanski
Aporte de:
id todo:paper_15206106_v122_n49_p11295_Turjanski
record_format dspace
spelling todo:paper_15206106_v122_n49_p11295_Turjanski2023-10-03T16:20:32Z On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences Turjanski, P. Ferreiro, D.U. Amino acids Bioinformatics Structure (composition) Adjustable parameters Amino acid patterns Natural architecture Natural structures Precise definition Protein sequences Sequence patterns Standard deviation Proteins All known terrestrial proteins are coded as continuous strings of ≈20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for "repetition", an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patterns can be well-separated into disjoint classes according to their recurrence in nested structures. The statistics of the occurrences of patterns indicate that short repetitions are sufficient to account for the differences between natural families and randomized groups of sequences by more than 10 standard deviations, while contiguous sequence patterns shorter than 5 residues are effectively random in their occurrences. A small subset of patterns is sufficient to account for a robust "familiarity" definition between arbitrary sets of sequences. © 2018 American Chemical Society. JOUR info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_15206106_v122_n49_p11295_Turjanski
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Amino acids
Bioinformatics
Structure (composition)
Adjustable parameters
Amino acid patterns
Natural architecture
Natural structures
Precise definition
Protein sequences
Sequence patterns
Standard deviation
Proteins
spellingShingle Amino acids
Bioinformatics
Structure (composition)
Adjustable parameters
Amino acid patterns
Natural architecture
Natural structures
Precise definition
Protein sequences
Sequence patterns
Standard deviation
Proteins
Turjanski, P.
Ferreiro, D.U.
On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences
topic_facet Amino acids
Bioinformatics
Structure (composition)
Adjustable parameters
Amino acid patterns
Natural architecture
Natural structures
Precise definition
Protein sequences
Sequence patterns
Standard deviation
Proteins
description All known terrestrial proteins are coded as continuous strings of ≈20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for "repetition", an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patterns can be well-separated into disjoint classes according to their recurrence in nested structures. The statistics of the occurrences of patterns indicate that short repetitions are sufficient to account for the differences between natural families and randomized groups of sequences by more than 10 standard deviations, while contiguous sequence patterns shorter than 5 residues are effectively random in their occurrences. A small subset of patterns is sufficient to account for a robust "familiarity" definition between arbitrary sets of sequences. © 2018 American Chemical Society.
format JOUR
author Turjanski, P.
Ferreiro, D.U.
author_facet Turjanski, P.
Ferreiro, D.U.
author_sort Turjanski, P.
title On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences
title_short On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences
title_full On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences
title_fullStr On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences
title_full_unstemmed On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences
title_sort on the natural structure of amino acid patterns in families of protein sequences
url http://hdl.handle.net/20.500.12110/paper_15206106_v122_n49_p11295_Turjanski
work_keys_str_mv AT turjanskip onthenaturalstructureofaminoacidpatternsinfamiliesofproteinsequences
AT ferreirodu onthenaturalstructureofaminoacidpatternsinfamiliesofproteinsequences
_version_ 1782026317394345984