Proteiinianalyysi 2 http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2005/proteiinianalyysi.

Slides:



Advertisements
Samankaltaiset esitykset
Tietokantakehitys kiinteäksi osaksi modernia ohjelmistokehitystä Vesa Tikkanen |
Advertisements

EPiServer 7. Paras EPiServer ikinä käyttäjää.
1 Sektorin nimi. 2 Reading times of magazines NRS Finland 2012.
WAP-services “ WAP-interface makes it easier to use mobile services now than ever before - you simply move from one page to another, from one service to.
Location-aware applications: keyword clustering
Development Association SEPRA How to involve youth into strategic rural development work? Budapest, 8th November 2011 Euroopan maaseudun kehittämisen maatalousrahasto:
Kuopion Lyseon lukio IB World School
VANHEMPAINNEUVOSTO Kokoontuu n. 3 krt / lukuvuosi Kokoontuu klo 18 Mahdollisuus keskusteluun muiden huoltajien kanssa Korvaa johtokunnan Joka luokasta.
UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Data and Software Mining TIES447 Data and Software Mining Miika Nurminen.
End-User Needs and Contextual Design Katja Konkka and Satu Kalliokulju
Proteiinianalyysi (2 ov)
A solution for flexible bicycle transportation
S ysteemianalyysin Laboratorio Teknillinen korkeakoulu Mustajoki and Hämäläinen Decision analysis by interval SMART/SWING / 1 Decision analysis by interval.
Fuzzy Pay-off Method for Real Option Valuation Sumean tuoton menetelmä reaalioption arvon laskemiseen Dr. Mikael Collan IAMSR, ÅA.
GECOS Global Engineering Coordination Support
1 Sektorin nimi. 2 Reading times of magazines NRS Finland 2011.
ICT4D in teacher training - Tieto- ja viestintätekniikkaa kehitysmaan opettajankoulutuksessa Mikko Vesisenaho Faculty of Education.
Proteiinianalyysi 6 Sekundaarirakenteen ennustaminen ads/teaching/spring2006/proteiinianalyysi.
JYVÄSKYLÄN YLIOPISTO UNIVERSITY OF JYVÄSKYLÄJYVÄSKYLÄN YLIOPISTO UNIVERSITY OF JYVÄSKYLÄ Creating methodologic al tools for wp2-wp4 Workpackage 1 UPDATE.
Oil spills and international legislation Tapani Salmenhaara, KyAMK
Monien mahdollisuuksien OpinOvi – Avataan se yhdessä. Two different projects funded by ESF Varsinais-Suomen OpinOvi –project is implemented in co- operation.
Tekstitiedostosta lukeminen tMyn1 Tekstitiedostosta lukeminen Tiedosto voidaan avata pelkästään lukemista varten tai kirjoittamista ja lukemista varten.
Numerotiedot päivitetään kalvoihin helmikuussa, kun kaikki tilastoluvut vuodelta 2009 ovat tiedossa. Lisäksi kalvoja täydenne- tään uusien yhtiöiden esityksillä.
TAMPEREEN YLIOPISTOUNIVERSITY OF TAMPERE TIETOJENKÄSITTELYTIETEIDEN LAITOS DEPARTMENT OF COMPUTER SCIENCES Good evaluation practice guidelines for health.
Mat Decision Making and Problem Solving
Laadukkaita palveluja vaivattomasti Pohjois-Pohjanmaan maistraatti Oulun yksikkö Registration of foreign citizens.
Irmeli Sinkkonen TkL, tutkija
Export Marketing Veikko Laine ”Product” = ? - What the producer produces and markets - What the customer buys - Tangible & Intangible - Concrete & Abstract.
Fiksu Opiskelija. Opetusaineisto jätteen synnyn ehkäisystä HSY Jätehuolto. Thoughts about Good Life Collected by Tuovi Kurttio, Pääkaupunkiseudun.
INFRA ry Vastuuhenkilö Eija Ehrukainen Ottaa käsiteltäväkseen myös asfalttialan ympäristöasiat Seurataan, vaikutetaan ja ohjeistetaan: Lainsäädännön muutokset.
By Learning for Integration ry. Immigration issues in Finland: Somalis  Until the 1980s Finland was very much a homogenous society with only a few foreigners.
Verbin perusmuoto 1 ’TO SING’ OR ’SING’.
Yhteiskuntaoppi, oppitunti 3
Tulevaisuussuunnitelma Osa 3
Hierarchical models Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi
DIC and BMA in BUGS Biotieteellinen tiedekunta / Henkilön nimi / Esityksen nimi
Probability models and decision analysis
COURSE: Indoor environment technology (5 ECTS)
Hankalia (?) tilanteita Tammikuu 2014 Keke Tulomäki
Structured Teaching Teacher Professional Development
The Game Due: Wednesday, July 21st.
© Haaga-Helia StartUp School
Problems at ElAr Sports
Tom40 is likely common to all mitochondria
Superposition of Waves
Next generation sequencing by synthesis.
Implementing a System for Intentional Concurrency in Jikes RVM
Information for teachers
2D-3D Registration of Optical and Ladar Imagery for Real-Time Tracking
Workplan for today Bayesian vs. Frequentist statistics ?
The Indian act pg. 137.
Copyright Pearson Prentice Hall
Stata Conference July 11-12, 2019
NoodleTools: How to Cite
Beekeeping at ctjs By Qarnayn
Timothy A. Keller, Marcel Adam Just  Neuron 
Synchronization (1).
Chapter 4: Demand Section 1
Volume 86, Issue 1, Pages (January 2004)
ABC Micelleplexes: Precise Compaction and High Colloidal Stability
STEAM :Science, Technology Engineering,
Globalization.
Modern Evolutionary Classification 18-2
BCR-ABL kinase domain mutation analysis in chronic myeloid leukemia patients treated with tyrosine kinase inhibitors: recommendations from an expert panel.
Which Test Should I Use?.
draft-merge-ccamp-gmpls-otn-b100g-applicability-00
Community Structure and Biodiversity
OPENING SLIDE Your Subtitle.
Agenda Summary Smart Fan The Future of Analog IC Technology
Javier Garcia-Bernardo, Mary J. Dunlop  Biophysical Journal 
Esityksen transkriptio:

Proteiinianalyysi 2 http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2005/proteiinianalyysi

folding problem Stability Specificity How does the amino acid sequence encode the 3D structure? Any possible folding of the mainchain places different residues in contact. The interactions of the sidechains and mainchain, with one another and with the solvent, and the restrictions placed on sidechain mobility, determine the relative stabilities of different conformations. 2nd law of thermodynamics: Systems at constant temperature and pressure find an equilibrium state that is a compromise between comfort (low enthalpy, H) and freedom (high entropy, S), to give a minimum Gibbs free energy G=H-TS, in which T is the absolute temperature Specificity Proteins have evolved so that one folding pattern of the mainchain is thermodynamically significantly better than other conformations.

Some specific interactions: Hydrogen-bonding: Hydrogen bound to Oxygen, Nitrogen and Sulfur has a smaller van der Waal's radius and more partial charge than hydrogen on carbon making the interaction between a charge or dipole with such "donar groups" stronger and more orientation dependent. Aromatic-Aromatic Interactions: The aromatic amino acid sidechains of phenylalanine, tyrosine and tryptophane prefer interactions with interplane angles of around 90 degrees. The interaction has a quadrupole-quadrupole character (distance dependence ~1/r5) with dependence on the interplane angle. Aromatic-Polar/Charged-Group Interactions: The charge separation of aromatics with ring hydrogens having partial positive charges and the pi electron system being partially negative, leads to interactions with O, S and N groups in proteins which are relatively strong (distance dependence ~1/r3 to 1/r4) and orientation dependent.

Packing preferences Phe-Tyr

Hydrophobic Effect Entropic driving force for self-association of non-polar groups in an aqueous environment. Water cage around hydrophobic group Entropic gain on releasing water to the bulk solvent

stability and denaturation proteins are marginally stable 20-60 kJ-mol, equivalent to one or two water-water hydrogen bonds unique native conformation == loss of entropy hydrophobic effect secondary structure (SS) formation compensates for water-protein hydrogen bonds of buried polar groups sidechain entropy > SS preferences compactness > van der Waals force

folding - summary all residues must be in stereochemically allowed conformations buried polar atoms must be hydrogen bonded to other buried polar atoms enough hydrophobic surface must be buried, and the interior must be sufficiently densely packed

Folding pathways initial formation of ”molten globule” containing some native secondary structure but without the tertiary structural interactions that lock the structure into its final conformation hierarchical condensation > low-energy pathway for structure assembly

Topology diagrams There are a number of ways to represent the folding of a protein and the arrangement of secondary structure elements within the tertiary structure. While these simplifications don't show the sidechain and mainchain interactions that hold the structures together, they do reveal the folding pattern. Examination of such diagrams reveals recurring structural patterns in protein folding.

Arkkitehtuuri

Mainly alpha Up-down bundle Horseshoe Orthogonal bundle Solenoid

Efimov (1997) Proteins 28, 241-260

Mainly beta Ribbon Roll Single sheet Barrel

Clam Distorted sandwich 3-layer sandwich Sandwich

Prism Trefoil Aligned prism

6-propellor 5-propellor 7-propellor 4-propellor

8-propellor Complex 2-solenoid 3-solenoid

Mixed alpha-beta Roll Barrel Super roll Horseshoe

2-layer sandwich 3-layer bba sandwich 3-layer aba sandwich 4-layer sandwich

Alpha-beta prism Box Complex (Immunoglobulin/ Lipoprotein) 5-stranded propeller

Complex (mixed alpha-beta) Irregular

Yhteenveto - arkkitehtuuri arkkitehtuuri = sekundaarirakenteiden yhteen pakkaaminen silmukoiden kiinnittyminen (topologia) jätetty huomiotta suositut arkkitehtuurit muistuttavat säännöllisiä geometrisia tiiveimpiä pakkauksia konvergentti evoluutio samanlaisiin laskoksiin Rakenne säilyy vaikka sekvenssit muuntelevat paljonkin – antaa evoluutiolle pelitilaa Hajautettu koodi: joistakin proteiineista on muutettu 30 % aminohapoista alaniiniksi laskoksen muuttumatta

monen sekvenssin linjaus evoluutiomallit

Multiple alignments provide more information than pairwise alignments • Useful to confirm distant relationships • Provides a context for interpreting patterns of similarity and difference • "Speciation" over alignment space helps to connect and confirm widely degenerate motifs

Close relationships: Muscarinic receptors Intermediate relationships: Prostaglandin receptors Distant Fungal pheromone receptors

Database searching • The first and most common operation in protein informatics...and the only way to access the information in large databases • Primary tool for inference of homologous structure and function • Improved algorithms to handle large databases quickly • Provides an estimate of statistical significance • Generates alignments • Definitions of similarity can be tuned using different scoring matrices and algorithm-specific parameters

Merkkijonojen vertailu Hammingin etäisyys = mutaatioiden määrä, ei aukkoja Levenshteinin etäisyys = editointietäisyys = mutaatiot plus aukot substituutiomatriisi

Nyrkkisääntöjä >45 % identtisyys (koko domeenissa): lähes identtinen rakenne > 25 % identtisyys: samankaltainen rakenne Twilight zone (R.F.Doolittle): 18-25 % identtisyys: homologia epävarmaa

Esimerkkejä myoglobiini / leghemoglobiini: 15 % identtisiä, homologisia rodaneesin N- ja C-terminaaliset domeenit: 11 % identtisiä, geeniduplikaatio kymotrypsiini / subtilisiini: 12 % identtisiä, samankaltainen aktiivinen keskus, konvergentti evoluutio

Log-odds scores Preferences for amino acid types at a given column P = target distribution Q = background distribution S pi log pi / qi Assumes target distribution is at equilibrium Assumes column independence

q(i,j), i,j=1,…,20 observed frequency co-occurrence of amino acids i,j at BLOCKS sites s(i,j)= 2 log2 (q(i,j)/e(i,j)) similarity score e(i,j)=2 p(i) p(j), if i≠j expected frequency of occurrence e(i,j)=p(i) p(i), if i=j p(i)=q(i,i)+S q(i,j)/2 frequency of occurrence of amino acid i≠j

Example: calmodulin EF-loop CALM_HUMAN_1 D K D G D G T I T T K E CALF_NAEGR_1 D K D G D G T I T T S E CALM_SCHPO_1 D R D Q D G N I T S N E CALM_HUMAN_2 D A D G N G T I D F P E CALF_NAEGR_2 D A D G N G T I D F T E CALM_SCHPO_2 D A D G N G T I D F T E CALM_HUMAN_3 D K D G N G Y I S A A E CALF_NAEGR_3 D K D G N G F I S A Q E CALM_SCHPO_3 D K D G N G Y I T V E E CALM_HUMAN_4 D I D G D G Q V N Y E E CALF_NAEGR_4 D I D G D N Q I N Y T E CALM_SCHPO_4 D T D G D G V I N Y E E 90 % Consensus D x D GD/NG x I x x x E

Sekvenssiprofiilit PSSM = position-specific scoring matrix PSI-Blast, iteratiivinen HMM, piilomarkovmallit

Steps in a PSI-Blast search • Constructs a multiple alignment from a Gapped Blast search and generates a profile from any significant local alignments found • The profile is compared to the protein database and PSI-BLAST estimates the statistical significance of the local alignments found, using "significant" hits to extend the profile for the next round • PSI-BLAST iterates step 2 an arbitrary number of times or until convergence

piilomarkovmalli (HMM) probabilistinen malli aikasarjoille tai lineaarisille sekvensseille puheentunnistuksessa proteiiniperheiden mallitus kuvaa todennäköisyysjakaumaa sekvenssiavaruuden yli todennäköisyyksien summa = 1 generatiivinen malli sekvenssi voidaan linjata ja ”pisteyttää” mallia vastaan

Piilomarkovmalli kaksitilamuuttujalle t=transitiotodennäköisyys p=emissiotodennäköisyys t(1,2) t(2,end) end 1 2 HMM p1(a) p1(b) p2(a) p2(b) tila, p 1  1  2  end a b a havaittu symbolisekvenssi, x t(1,1) t(1,2) t(2,end) p1(a) p1(b) p2(a) P(x,p | HMM)

profiili-HMM insert match delete begin 1 2 3 4 end match state emits one of 20 amino acids insert state emits one of 20 amino acids delete, begin, end states are mute

profiili-HMM lineaarinen malli pisteytys vastaa log-odds-scorea aukkosakkokin muodollisesti a+b(x-1) käyttö: monen sekvenssin linjaus parempi tulos saadaan esim. PSI-Blastilla homologien tunnistaminen: millä todennäköisyydellä HMM generoi testisekvenssin sekvenssin linjaus mallia vastaan

Types of alignment Sequence-sequence Sequence-profile Profile-profile Target distribution = generic substitution matrix Sequence-profile Position-specific target distributions Profile-profile Observed frequencies from multiple alignment Average both ways Pair HMM Probability that two HMMs generate same sequence