[7.4.2010] The seminar work describes the concept of data mining, activities that need to be done in order to use data mining successfully and data mining methods. The work is a result of team work with Jasmin mit and Monika Tukari.
This document describes various in vitro models and methods that can be used to study hepatotoxicity, including hepatocyte cell cultures, assays to measure cell viability and metabolic activity (trypan blue dye exclusion test, MTT assay), staining to visualize lipid accumulation (Oil Red O), and techniques to examine gene and protein expression changes (RT-PCR, western blotting). Specifically, it discusses using these methods to establish models of non-alcoholic fatty liver disease (NAFLD) by treating hepatocyte cultures with fatty acids like palmitic and oleic acid, and models of drug-induced hepatotoxicity by treating with acetaminophen or amiodarone. Key readouts include lipid accumulation, apoptosis levels
This document summarizes various liver diseases and their etiologies. It discusses alcoholic liver disease, drug-induced liver injury, viral hepatitis infections from hepatitis B, C, and D viruses, autoimmune disorders like autoimmune hepatitis and primary biliary cirrhosis, genetic disorders, non-alcoholic fatty liver disease, cirrhosis, and hepatocellular carcinoma. The liver's important functions are outlined. Causes, pathogenesis, clinical features, diagnosis, and treatment approaches are described for each disease.
An introduction to experimental epidemiology improvemed
油
This document provides an overview of experimental epidemiology methods. It discusses the key features and types of experimental epidemiology studies, including controlled field trials and community trials. Controlled field trials involve dividing healthy subjects into an exposed group that receives an active substance (like a vaccine) and an unexposed control group that receives a placebo. Community trials involve entire exposed and unexposed communities. Randomized controlled trials, which assign individual subjects randomly to intervention or control groups, are described as the most common experimental method but are covered in more depth separately. Overall, the document outlines the design and purpose of various experimental epidemiology study types.
Genotyping methods of nosocomial infections pathogenimprovemed
油
Nosocomial infections afflict around 2 million patients in the US each year, resulting in around 88,000 deaths and $4.5 billion in excess healthcare costs. Understanding the distribution and relatedness of pathogens that cause these infections is important for designing effective control methods. Historically, phenotypic characterization was used, but increasingly molecular or genotyping techniques are being used, including pulsed-field gel electrophoresis, multilocus sequence typing, and polymerase chain reaction-based methods. Studies have shown that integrating molecular typing into infection control programs can significantly reduce infection rates and healthcare costs.
Use of MALDI-TOF in the diagnosis of infectious diseasesimprovemed
油
MALDI-TOF MS has revolutionized clinical microbiology by drastically improving the time needed to identify bacterial cultures from over 24 hours to just a few minutes. Whereas the entire process from sampling to results previously took 2-3 days or more, new methods like MALDI-TOF MS and molecular technology have reduced this to just a few hours or one day. MALDI-TOF MS is a powerful, cost-effective, and easy to implement technique that provides rapid and reliable identification of bacteria and yeast from clinical samples at the genus and species level through analysis of their protein mass spectral signatures.
1. Molecular microbiology methods like PCR and hybridization have revolutionized clinical diagnostics by enabling fast and direct detection of pathogens from clinical samples.
2. PCR in particular has become a mainstay technique, allowing amplification of specific DNA sequences from small amounts of input DNA. Variations like real-time PCR, multiplex PCR, and broad-range PCR further expanded diagnostic capabilities.
3. Emerging technologies like DNA microarrays promise even greater multiplexing, with the ability to simultaneously genotype large genomic regions or measure expression of many genes, positioning them as promising future molecular diagnostic tools.
This document provides information about setting up and conducting experiments with isolated organs and tissue rings, including:
1. Describing the mechanical setup for a four-channel system bath for isolated organs.
2. Explaining the preparation of Krebs-Hanseleit solution and common drugs used.
3. Outlining typical experiment protocols, including stabilizing tissues, pre-contraction testing, and assessing endothelial function.
4. Noting that each experiment begins by preparing Krebs-Hanseleit solution and activating the system before surgery and setting rings in wells.
This document describes the components, work principles, and experimental protocols for using a pressure myograph system to study isolated blood vessels. The system allows measuring vessel diameter in response to drugs and stimuli while maintaining constant temperature. Experiments involve isolating small arteries from rats and attaching them to glass micropipettes in a chamber filled with physiological salt solution. Vessel diameter is recorded under varying pressures and drug exposures to study endothelial function and vasoactive mechanisms. Statistical analysis of diameter changes under different conditions uses repeated measures ANOVA to compare responses between experimental groups.
Notes for Measuring blood flow and reactivity of the blood vessels in the ski...improvemed
油
This document describes the laser Doppler flowmetry (LDF) method for measuring blood flow in the microcirculation of skin. Specifically, it discusses post-occlusive reactive hyperemia (PORH) testing using LDF to assess microvascular reactivity by inducing a brief occlusion of blood vessels. It also covers iontophoresis of acetylcholine and sodium nitroprusside combined with LDF to evaluate endothelium-dependent and independent vasodilation respectively. Standardization of methods like occlusion duration and probe placement is important for reproducibility. LDF provides a general index of microvascular function rather than direct flow measurements.
Notes for STAINING AND ANALYSIS of HISTOLOGICAL PREPARATIONSimprovemed
油
This document provides an overview of histological staining techniques. It discusses how histological preparations are stained using interactions between dyes, solvents, and tissue components. Different staining methods result in different colors that highlight various structures. A classic example is hematoxylin and eosin staining, where hematoxylin stains acidic components blue and eosin stains basic components pink. Specialized staining techniques also exist, such as immunohistochemistry. Proper staining selection depends on the tissue and research goals. Histological preparations are then analyzed under a microscope to study cell and tissue morphology.
Notes for Fixation of tissues and organs for educational and scientific purposesimprovemed
油
Fixation of tissues and organs is done to preserve them for scientific and educational purposes. Various chemical fixatives are used including formaldehyde, alcohols, and acids. Formaldehyde cross-links proteins to harden the tissue while maintaining the original structure. Several fixation protocols are used for different purposes, balancing preservation of color and long-term durability. Key steps include diffusion or injection of fixatives, followed by storage in preservative solutions. Proper fixation and storage are necessary to prevent degradation over time.
The document summarizes the process of preparing tissue samples for histological analysis, including fixation, dehydration, infiltration/embedding, sectioning, staining, and examination. Key steps involve fixing tissues to prevent degradation, dehydrating using increasing alcohol concentrations, infiltrating with paraffin wax or resin for structural support during sectioning, precisely cutting thin sections, mounting them to glass slides, staining, and examining under a microscope. The quality of prepared samples depends on carefully following each step of the preparation process.
Notes for The principle and performance of capillary electrophoresisimprovemed
油
This document provides an overview of capillary electrophoresis (CE). It begins by introducing CE and its advantages over other separation techniques. It then describes the basic theory behind CE, including electrophoretic mobility, electroosmotic flow, and how samples migrate through the capillary when an electric field is applied. The document details the key components of a CE instrument and various CE separation techniques such as capillary zone electrophoresis, micellar electrokinetic chromatography, and capillary isoelectric focusing. It focuses on the principles and applications of CE.
Notes for The principle and performance of liquid chromatographymass spectro...improvemed
油
This document provides an overview of liquid chromatography-mass spectrometry (LC-MS). It describes the basic components and functioning of an LC-MS system, including the liquid chromatograph and mass spectrometer connected by an interface. The document discusses various ionization sources like electrospray ionization and atmospheric pressure chemical ionization, as well as mass analyzers like quadrupoles and time-of-flight analyzers. It also covers detectors used in LC-MS like electron multipliers and photomultipliers. Overall, the document serves as a technical introduction to the principles and components of LC-MS.
This document provides an overview of basic cell culture techniques. It discusses the history of cell culture, defining primary and secondary cell cultures. It describes different types of cell lines and how cells grow as monolayers or in suspension. The document outlines the key equipment needed for a cell culture laboratory, including biosafety cabinets, CO2 incubators, centrifuges, microscopes, and supplies. It emphasizes the importance of aseptic technique to prevent microbial contamination when working with cell cultures.
This document discusses systems biology and its goals of understanding how biological molecules interact and systems function as a whole. It covers:
1) Systems biology uses large datasets from "omics" experiments and computational models to understand complex biological interactions beyond individual molecules.
2) Pioneering work used microarrays to measure thousands of genes in serum-stimulated cells, finding over 500 changed in proliferation.
3) The field aims to discover emergent system properties and functions not evident from separate parts, like switches that change cell behavior.
Systems biology for Medicine' is 'Experimental methods and the big datasetsimprovemed
油
This document discusses experimental methods used in systems biology to generate large datasets, including microarrays, sequencing-based methods, mass spectrometry, and liquid chromatography. It explains that systems biology studies must be quantitative and enable computational modeling. Key methods covered are microarrays, RNA-seq, ChIP-seq, whole-genome sequencing, whole-exome sequencing, proteomics using mass spectrometry, and combining liquid chromatography with mass spectrometry for lipidomics, metabolomics and glycomics. Sources of variation are also discussed for genomic and proteomic studies.
Systems biology for medical students/Systems medicineimprovemed
油
Systems biology takes a holistic approach to studying biological systems by considering all the interactions within a system and how they generate complex behaviors. Lecture 1 introduces key concepts in systems biology like how increasing levels of biological organization give rise to new system properties like robustness. Lecture 2 discusses experimental methods like genomics, proteomics, and metabolomics that generate large data sets for systems analysis. Lecture 3 covers mathematical and statistical tools for analyzing these data sets, such as using differential equations to model signaling networks. Lecture 4 provides examples of medical applications of systems biology in finding diagnostic markers, personalizing therapy, and predicting disease interactions from human disease networks, with the future of medicine taking a more predictive, preventive, and personalized approach
The document discusses several use cases for applying data mining and machine learning techniques in healthcare and biomedical research. Three examples are:
1) Early diagnosis of cancers like lung cancer and breast cancer through predictive modeling of patient data to detect cancers at earlier stages when survival rates are higher.
2) Predicting patient responses to drug therapies for cancers like breast cancer by combining different types of molecular profiling data using techniques like support vector machines and random forests.
3) Using imaging data and temporal analysis of metrics like medication purchases to better understand and predict chronic diseases like diabetes and associated health complications.
This document describes various in vitro models and methods that can be used to study hepatotoxicity, including hepatocyte cell cultures, assays to measure cell viability and metabolic activity (trypan blue dye exclusion test, MTT assay), staining to visualize lipid accumulation (Oil Red O), and techniques to examine gene and protein expression changes (RT-PCR, western blotting). Specifically, it discusses using these methods to establish models of non-alcoholic fatty liver disease (NAFLD) by treating hepatocyte cultures with fatty acids like palmitic and oleic acid, and models of drug-induced hepatotoxicity by treating with acetaminophen or amiodarone. Key readouts include lipid accumulation, apoptosis levels
This document summarizes various liver diseases and their etiologies. It discusses alcoholic liver disease, drug-induced liver injury, viral hepatitis infections from hepatitis B, C, and D viruses, autoimmune disorders like autoimmune hepatitis and primary biliary cirrhosis, genetic disorders, non-alcoholic fatty liver disease, cirrhosis, and hepatocellular carcinoma. The liver's important functions are outlined. Causes, pathogenesis, clinical features, diagnosis, and treatment approaches are described for each disease.
An introduction to experimental epidemiology improvemed
油
This document provides an overview of experimental epidemiology methods. It discusses the key features and types of experimental epidemiology studies, including controlled field trials and community trials. Controlled field trials involve dividing healthy subjects into an exposed group that receives an active substance (like a vaccine) and an unexposed control group that receives a placebo. Community trials involve entire exposed and unexposed communities. Randomized controlled trials, which assign individual subjects randomly to intervention or control groups, are described as the most common experimental method but are covered in more depth separately. Overall, the document outlines the design and purpose of various experimental epidemiology study types.
Genotyping methods of nosocomial infections pathogenimprovemed
油
Nosocomial infections afflict around 2 million patients in the US each year, resulting in around 88,000 deaths and $4.5 billion in excess healthcare costs. Understanding the distribution and relatedness of pathogens that cause these infections is important for designing effective control methods. Historically, phenotypic characterization was used, but increasingly molecular or genotyping techniques are being used, including pulsed-field gel electrophoresis, multilocus sequence typing, and polymerase chain reaction-based methods. Studies have shown that integrating molecular typing into infection control programs can significantly reduce infection rates and healthcare costs.
Use of MALDI-TOF in the diagnosis of infectious diseasesimprovemed
油
MALDI-TOF MS has revolutionized clinical microbiology by drastically improving the time needed to identify bacterial cultures from over 24 hours to just a few minutes. Whereas the entire process from sampling to results previously took 2-3 days or more, new methods like MALDI-TOF MS and molecular technology have reduced this to just a few hours or one day. MALDI-TOF MS is a powerful, cost-effective, and easy to implement technique that provides rapid and reliable identification of bacteria and yeast from clinical samples at the genus and species level through analysis of their protein mass spectral signatures.
1. Molecular microbiology methods like PCR and hybridization have revolutionized clinical diagnostics by enabling fast and direct detection of pathogens from clinical samples.
2. PCR in particular has become a mainstay technique, allowing amplification of specific DNA sequences from small amounts of input DNA. Variations like real-time PCR, multiplex PCR, and broad-range PCR further expanded diagnostic capabilities.
3. Emerging technologies like DNA microarrays promise even greater multiplexing, with the ability to simultaneously genotype large genomic regions or measure expression of many genes, positioning them as promising future molecular diagnostic tools.
This document provides information about setting up and conducting experiments with isolated organs and tissue rings, including:
1. Describing the mechanical setup for a four-channel system bath for isolated organs.
2. Explaining the preparation of Krebs-Hanseleit solution and common drugs used.
3. Outlining typical experiment protocols, including stabilizing tissues, pre-contraction testing, and assessing endothelial function.
4. Noting that each experiment begins by preparing Krebs-Hanseleit solution and activating the system before surgery and setting rings in wells.
This document describes the components, work principles, and experimental protocols for using a pressure myograph system to study isolated blood vessels. The system allows measuring vessel diameter in response to drugs and stimuli while maintaining constant temperature. Experiments involve isolating small arteries from rats and attaching them to glass micropipettes in a chamber filled with physiological salt solution. Vessel diameter is recorded under varying pressures and drug exposures to study endothelial function and vasoactive mechanisms. Statistical analysis of diameter changes under different conditions uses repeated measures ANOVA to compare responses between experimental groups.
Notes for Measuring blood flow and reactivity of the blood vessels in the ski...improvemed
油
This document describes the laser Doppler flowmetry (LDF) method for measuring blood flow in the microcirculation of skin. Specifically, it discusses post-occlusive reactive hyperemia (PORH) testing using LDF to assess microvascular reactivity by inducing a brief occlusion of blood vessels. It also covers iontophoresis of acetylcholine and sodium nitroprusside combined with LDF to evaluate endothelium-dependent and independent vasodilation respectively. Standardization of methods like occlusion duration and probe placement is important for reproducibility. LDF provides a general index of microvascular function rather than direct flow measurements.
Notes for STAINING AND ANALYSIS of HISTOLOGICAL PREPARATIONSimprovemed
油
This document provides an overview of histological staining techniques. It discusses how histological preparations are stained using interactions between dyes, solvents, and tissue components. Different staining methods result in different colors that highlight various structures. A classic example is hematoxylin and eosin staining, where hematoxylin stains acidic components blue and eosin stains basic components pink. Specialized staining techniques also exist, such as immunohistochemistry. Proper staining selection depends on the tissue and research goals. Histological preparations are then analyzed under a microscope to study cell and tissue morphology.
Notes for Fixation of tissues and organs for educational and scientific purposesimprovemed
油
Fixation of tissues and organs is done to preserve them for scientific and educational purposes. Various chemical fixatives are used including formaldehyde, alcohols, and acids. Formaldehyde cross-links proteins to harden the tissue while maintaining the original structure. Several fixation protocols are used for different purposes, balancing preservation of color and long-term durability. Key steps include diffusion or injection of fixatives, followed by storage in preservative solutions. Proper fixation and storage are necessary to prevent degradation over time.
The document summarizes the process of preparing tissue samples for histological analysis, including fixation, dehydration, infiltration/embedding, sectioning, staining, and examination. Key steps involve fixing tissues to prevent degradation, dehydrating using increasing alcohol concentrations, infiltrating with paraffin wax or resin for structural support during sectioning, precisely cutting thin sections, mounting them to glass slides, staining, and examining under a microscope. The quality of prepared samples depends on carefully following each step of the preparation process.
Notes for The principle and performance of capillary electrophoresisimprovemed
油
This document provides an overview of capillary electrophoresis (CE). It begins by introducing CE and its advantages over other separation techniques. It then describes the basic theory behind CE, including electrophoretic mobility, electroosmotic flow, and how samples migrate through the capillary when an electric field is applied. The document details the key components of a CE instrument and various CE separation techniques such as capillary zone electrophoresis, micellar electrokinetic chromatography, and capillary isoelectric focusing. It focuses on the principles and applications of CE.
Notes for The principle and performance of liquid chromatographymass spectro...improvemed
油
This document provides an overview of liquid chromatography-mass spectrometry (LC-MS). It describes the basic components and functioning of an LC-MS system, including the liquid chromatograph and mass spectrometer connected by an interface. The document discusses various ionization sources like electrospray ionization and atmospheric pressure chemical ionization, as well as mass analyzers like quadrupoles and time-of-flight analyzers. It also covers detectors used in LC-MS like electron multipliers and photomultipliers. Overall, the document serves as a technical introduction to the principles and components of LC-MS.
This document provides an overview of basic cell culture techniques. It discusses the history of cell culture, defining primary and secondary cell cultures. It describes different types of cell lines and how cells grow as monolayers or in suspension. The document outlines the key equipment needed for a cell culture laboratory, including biosafety cabinets, CO2 incubators, centrifuges, microscopes, and supplies. It emphasizes the importance of aseptic technique to prevent microbial contamination when working with cell cultures.
This document discusses systems biology and its goals of understanding how biological molecules interact and systems function as a whole. It covers:
1) Systems biology uses large datasets from "omics" experiments and computational models to understand complex biological interactions beyond individual molecules.
2) Pioneering work used microarrays to measure thousands of genes in serum-stimulated cells, finding over 500 changed in proliferation.
3) The field aims to discover emergent system properties and functions not evident from separate parts, like switches that change cell behavior.
Systems biology for Medicine' is 'Experimental methods and the big datasetsimprovemed
油
This document discusses experimental methods used in systems biology to generate large datasets, including microarrays, sequencing-based methods, mass spectrometry, and liquid chromatography. It explains that systems biology studies must be quantitative and enable computational modeling. Key methods covered are microarrays, RNA-seq, ChIP-seq, whole-genome sequencing, whole-exome sequencing, proteomics using mass spectrometry, and combining liquid chromatography with mass spectrometry for lipidomics, metabolomics and glycomics. Sources of variation are also discussed for genomic and proteomic studies.
Systems biology for medical students/Systems medicineimprovemed
油
Systems biology takes a holistic approach to studying biological systems by considering all the interactions within a system and how they generate complex behaviors. Lecture 1 introduces key concepts in systems biology like how increasing levels of biological organization give rise to new system properties like robustness. Lecture 2 discusses experimental methods like genomics, proteomics, and metabolomics that generate large data sets for systems analysis. Lecture 3 covers mathematical and statistical tools for analyzing these data sets, such as using differential equations to model signaling networks. Lecture 4 provides examples of medical applications of systems biology in finding diagnostic markers, personalizing therapy, and predicting disease interactions from human disease networks, with the future of medicine taking a more predictive, preventive, and personalized approach
The document discusses several use cases for applying data mining and machine learning techniques in healthcare and biomedical research. Three examples are:
1) Early diagnosis of cancers like lung cancer and breast cancer through predictive modeling of patient data to detect cancers at earlier stages when survival rates are higher.
2) Predicting patient responses to drug therapies for cancers like breast cancer by combining different types of molecular profiling data using techniques like support vector machines and random forests.
3) Using imaging data and temporal analysis of metrics like medication purchases to better understand and predict chronic diseases like diabetes and associated health complications.
2. I. Data Mining
DM se definira kao proces tra転enja zanimljiv ili vrijedne informacije (uzoraka)
unutar velike baze podataka
Na prvi pogled, ova definicija inia vi邸e kao novo ime za statistiku
Meutim, DM je upravo izveden na skupovima podataka koji su daleko vei od
statistike metode mogu tono analizirati
3. Metode rudarenja podataka
DM ukljuuje metode koje su na raskri転ju arteficial inteligencije, strojnog uenja,
statistike i sustava baza podataka
Ponekad, ove metode podr転ao smanjenje dimenzionalnosti, tako mapiranje skup
maksimalno informativnih dimenzija
Ponekad, oni predstavljaju odreene matematike modele
esto, kombinacija metoda se koristi za rje邸avanje problema
4. Metode rudarenja podataka
U osnovi, obrasci esto su definirani u odnosu na ukupnu modelu skupa podataka od kojeg je
dobiven
Tovdje su mnogi alati koji su ukljueni u analize podataka koje poma転u pronai ove strukture
Neki od najva転nijih alata ukljuiti
Clustering - in particioniranje skupove podataka mnogih sluajnih stvari u podskupove manje
veliine koje pokazuju zajedni邸tvo izmeu njih - by gleda na klastere, analitiari su u stanju izvui
statistike modele iz polja podataka
Rizlaz - the nain od postavljanje krivulju kroz niz toaka pomou neke dobrote-of-fit kriterij -
while ispitivanje predefiniranih dobrote-of-fit parametre - analitiari mogu pronai i opisati
uzorke
RUle izvlaenje - metoda kori邸tenja odnose izmeu varijabli uspostaviti neku vrstu rule
DATA vizualizacija - vrsta tehnike koje mogu pomoi nas objasniti (Razumjeti) trendovi i
slo転enost u podacima mnogo lak邸e
5. Metode rudarenja podataka
Naje邸e se koristi u zdravstvene znanosti
Logistika regresija (LR)
Podr邸ka Vector Machine (SVM)
Appriori i druge pravilo udruga rudarstvu (AR)
Odluka Tree algoritmi (DT)
algoritmi Klasifikacija: K-sredstva, mjeriteljstvo (Samoorganizacija MAP) Naivni
Bayesov
Arteficial Neuronske mre転e (ANN)
6. Ipak, kombinacija tehnika mo転e elicite odreenu rudarski funkciju
Tehnike Korisnost
Appriori
& Rast FP
Udruga pravilo rudarstvo za uestale seta opcije
(primjerice bolesti) u medicinskim bazama podataka
ANN
& Genetski algoritam
Izvlaenje uzoraka
otkrivanje trendove
Classifcation
Rje邸enje Tree algoritmi (ID3, C4, C5, CART) podr邸ku odluivanja
Klasifikacija
Kombinirana upotreba K-sredstva, SOm-naive Bayes Tono razvrstavanje
Kombinacija SVM, Ann i ID3 Klasifikacija
7. Logistika regresija (LR)
Popularna metoda za klasifikaciju pojedinaca, s obzirom na vrijednosti skupa nezavisnih
varijabli
Hoe li predmet oboljeti od dijabetesa?
Hoe li subjekt reagira na tretman?
Ona procjenjuje vjerojatnost da individaul je u odreenoj skupini
LR ne ini nikakve pretpostavke o normalnosti, linearnosti i homogenosti varijance za
nezavisne varijable
8. Sl. 1. Logistika regresija krivulja
Value proizvedena logistike regresije je Vrijednost vjerojatnosti izmeu 0.0 i 1.0
Ako je vjerojatnost za lanstvo u grupi u modelirani kategoriji iznad nekog rez toke (zadana je
0,50) - subjekt je predvieno da se lan skupine modeliranog
Ako je vjerojatnost ispod toke rez - subjekt je predvieno da se lan druge skupine
-7.5 -5 -2.5 2.5 5 7.5
0.2
0.4
0.6
0.8
1
9. Testiranje LR modela predstave (stane do niza podataka)
Testiranje modela, ovisno o vjerojatnosti p
ROC krivulje
statistika C
Ginijev koeficijent
KS test
Testiranje modela, ovisno o cuf-off vrijednosti
Osjetljivost (istina pozitivna stopa)
Specifinost (istina negativna stopa)
Tonost
pogre邸ka tipa I (dijabetes pogre邸nog prepoznavanja u)
gre邸ka tipa II (Pogre邸nog prepoznavanja u zdravih)
10. Linearni vs logistika regresija modela
U Linearna regresija - ishod (ovisna varijabla) je kontinuirano - to mo転e imati bilo koju
od beskonanog broja moguih vrijednosti.
U logistika regresija - ishod (zavisna varijabla) ima samo ogranien broj moguih
vrijednosti - to je koristi kada varijabla odgovor je kategorian u prirodi
Logistika model je neizbje転na ako se uklapa podaci puno bolje nego linearnog modela
jan mnoge situacije - je linearni model odgovara samo kao dobro, ili gotovo kao i
logistiki model
U stvari, u mnogim situacijama, linearna i logistiki model daje rezultate koji su praktiki
ne razlikuju
11. Sl. 2. Linearni vs logistika regresija modelu
Linearni model pretpostavlja da je vjerojatnost p je linearna funkcija regresora
Tli on logistiki model pretpostavlja da je log izgledi p/ (1p) je linearna funkcija regresora
12. Podr邸ka Vector Machine
Nadzirani postupak ML
Za klasifikaciju i regresijskih izazova (uglavnom za razvrstavanje)
Princip algoritam polaganja:
EACH dio podataka ucrtava se kao toka u n-dimenzionalni prostor (n= numbra znaajki
su varible posjeduju) S vrijednosti svakog znaajke se vrijednost odreenog koordinata
Zatim, klasifikacija se izvodi - po pronala転enju hiper-ravnina koje diferencirajua dvije klase
vrlo dobro
13. nadzirana ML bez nadzora ML
Bojnik dio od praktine ML koristi nadziranog uenja
Kada postoji ulazne varijable (X) i izlazna varijabla (Y) - AIgorithm koristi se kako bi
saznali funkcije mapiranja od ulaza do izlaza: Y = f (X)
Cilj je pribli転iti funkciju mapiranje tako dobro da kada imate nove ulazne podatke (x) -
mo転ete predvidjeti izlazne varijable (Y) za tim podacima
To se zove nadzirano uenje, jer je proces algoritma uenja iz trening skupa podataka
se mo転e shvatiti kao nastavnik nadzor procesa uenja.
Znamo tone odgovore, algoritam iterativno ini predvianja na podacima trening i
korigira strane nastavnika
Uenje se zaustavlja kada je algoritam posti転e prihvatljivu razinu performansi
Nadgledana problemi u uenju mogu se grupirati u regresije i klasifikacijskih
problema
Klasifikacija - kada je izlazna varijabla je kategorija, kao 邸to su bolesti i bez
Regresija - kada je izlazna varijabla je realna vrijednost, kao 邸to je te転ina
uobiajene metode od Supervised ML su:
Linearna regresija - za probleme regresijskih
Sluajna 邸uma - za klasifikaciju i regresijskih problema
Podr邸ka vektorske strojevi -za probleme klasifikacije
Kada su only ulaznih podataka (X) i bez odgovarajue
izlazne varijable
Cilj je model temeljne strukture ili distribucije u podacima -
kako bi saznali vi邸e o podacima
to je zove bez nadzora uenja jer za razliku od nadzirane
uenja - nema znan odgovoriti i nema nastavnik
Algoritmi su prepu邸teni vlastitim napravama za otkrivanjem
i predstavljanjem zanimljiv strukture u podacima
Bez nadzora problemi u uenju mogu se grupirati u klastera
i udruga problema
grupiranje - kada je problem otkriti inherentne grupiranja u
u podacima, kao 邸to su grupiranje kupnjom pona邸anje
asocijacija - kada je problem otkriti pravila koja opisuju
velike dijelove va邸ih podataka
uobiajene metode od Unsupervised ML su:
k-sredstva - za probleme klastera
Apriorno algoritam - za pravila udruga pote邸koama u
uenju
14. Appriori algoritam (AA)
/ Druga udruga Pravilo Rudarstvo (ARM)
ARM - tehnika otkriti kako stavke povezane su meusobno
AA - mpreostala association rmodule, izmeu esta jeETS artikala u lARGE databases (Sl. 3)
15. Stablo odluivanja (DT) algoritmi
U nadziranih algoritama uenja
Za klasifikaciju i regresijskih problema
DT algoritam poku邸ava rije邸iti problem pomou prikaz stabla (Sl. 4)
A dijagram toka strukturu nalik (Sl.)
EACH unutarnji vor predstavlja test za atribut
EACH grana predstavlja ishod testa
EACH list ( terminal vor) ima oznaku klase
Najvi邸i u vor drvo korijen vor
Postoje mnoge specifine odluka-algoritmi drvo
16. Sl. 4. DT algoritam simulirati brancing logiku stabla
18. Arteficial Neuronske mre転e (ANN)
A metoda umjetne inteligencije inspirirana po i strukturiran ljudski mozak
To je ML & DM metoda - metoda koja ue na primjerima
Koristi retrospektivne podatke
To se mo転e koristiti za predvianje, razvrstavanje i raspoznavanje uzoraka (npr association
problemi)
Prediction - brojana vrijednost je predviena kao izlaz (npr krvni tlak, dob i sl) i MSE ili RMSE
pogre邸ka se koristi kao mjera evaluacije izvedbe modela
Classification - predmeti se dodjeljuju u dvije ili vi邸e kategorija izlaza (npr prisutnost /
odsutnost bolesti, rezultat lijeenja, itd), a brzina klasifikacija se koristi kao mjera ocjenjivanje
izvedbe modela
ANNS pokazala uspjeh u modeliranju situacija u stvarnom svijetu, tako da se mogu koristiti iu
istra転ivake svrhe i za praktinu uporabu kao podr邸ka odluivanju ili simulacijski alat
19. Biolo邸ka vs Arteficial neuronske mre転e
(Sl. 6)
Neuronska mre転a - consists meusobno povezanih biolo邸kih neurona
Biolo邸ki neuronske - A stanica koje primajua podaci iz drugih neurona putem dendrita, postupakes to i
poslatia impuls putem aksona i sinapsi drugim neuronima u mre転i
Lzarada - provodi se promjenom te転inama sinaptikih veza - millions neurona mo転e paralelno obrade
informacija
Umjetna neuronska mre転a
predstavlja artificial neuron - procesorska jedinica (varijabilni) koji prima ulazni podatak od drugih
varijabli, pretvara ulaz prema formuli i 邸alje izlaz drugih varijabli
Uenje - provodi se promjenom vrijednosti te転ine varijabli (te転ina wji se promi邸lja kojim se mno転e ulaza)
21. Slika 7., - Generalizacija sposobnost ANN modela treba ispitati
Ona does ne osloniti na rezultata dobiti na jednom uzorku - mnogo iteracija uenja
na treningu postaviti odvijati u sredini (skrivenog) sloj - boravak izmeu ulaznih i
izlaznih slojeva
22. Kriteriji za razlikovanje Ann algoritme
Brojlanice koja slojeva
Type uenja
Nadzorom - rEAL izlazne vrijednosti su poznati iz pro邸losti i pod uvjetom da se u setu podataka
Bez nadzora - rEAL izlazne vrijednosti nisu poznati, a koji nisu predvieni u skup podataka, te mre転e koriste
se za klaster podataka u grupama po karakteristikama
Type veza izmeu neurona
Connection meu ulaznih i izlaznih podataka
janput i funkcije za prijenos
TIME karakteristike
Lzarada vrijeme
itd
23. II. Moderni raunalo-based metode
Graf-based DM
Vizualizacija podataka i Visual Analytics
topolo邸ki DM
Sline tehnike koje se mogu koristiti za organiziranje vrlo slo転ene i heterogenih
podataka
Podaci mogu biti vrlo moanako zapravo mo転ete shvatiti 邸to to vam govorim
To nije lako dobiti jasne takeaways gledajui pobio brojeva i statistika - potrebno je
ton podaci predstavljeni u logike, lako razumljiv nain - that`s situaciji kada za
ulazak neke od tih tehnika
24. Graf-based DM
Da bi se primijenila graf-based data mining tehnike, kao 邸to su razvrstavanje i grupiranje -
potrebno je definirati mjere blizine izmeu podataka predstavljenih u grafu (Sl. 8. i 9.)
Postoji nekoliko mjera u-graf blizina
Hyperlink izazvana Tema pretrage (hitova)
Neumannovim Kernel (NK)
Zajedniki najbli転eg susjeda (SNN)
25. Slika 8. -. Definiranje blizine mjera omoguuje strukturu vidljiv
Tokasto pokazuju slinost sa -1 do 1
26. Slika 9. -. Izvor dijagram pomou mjera NK-blizine
- N1 ... N8 vrhova (lanci)
- rubovi pokazuju citat
Navod Matrix C mogu formirati - Ako rub izmeu dva vrha postoji onda
je matrica stanica = 1 ostali = 0
27. Slika 10. -. Kako generalizirati matematiki
uzorak dalmatinskoga psa?
28. Vizualizacija podataka
ovjek mozak obrauje vizualne informacije bolje nego 邸to obrauje tekst - tako
po uporabu dijagrama, grafikona i elemente dizajna - vizualizaciju podataka mo転e
pomoi nas objasniti (Razumjeti) trendovi i statistika puno lak邸e (Sl. 10.)
Slika 10. -. Struktura stanovni邸tva po dobi - commoly koristi postupak
vizualizacije podataka u javnoj zdravstvenoj domeni
29. vizualizacija podataka
Uzorci podataka o miniranosti su toliko velika da je tokasto i histogrami e esto
pada kratko predstavlja nikakvu informaciju o realnoj vrijednosti (Sl, 11.)
Upravo iz tog razloga, analitiari bave data mining stalno tra転e bolje naine za
grafiki predstavlja podatke
Bez obzira alati analitiari e imati na dohvat ruke - obrasci i modeli koji se minirana
e biti samo dobre kvalitete kao podatke koji ga se izveden iz
30. Slika 11. -. Izrada graf jednostavniji i lak邸i za razumijevanje
31. domene od primjene Vizualizacija podataka i Visual Analytics
Tehnike
Vizualizacija velika, kompleks, multivarijatni biolo邸ke mre転e
Vizualni tekst analitike i klasificirati relevantne srodne poslove na biolo邸kih entiteta
u publikaciji baze (npr PubMed)
Vizualizacija za upoznavanje heterogenih podataka
i podaci iz vi邸e izvora podataka
Visual analitika 邸to je podr邸ka za razumijevanje nesigurnost
i pitanje kvalitete podataka
32. Slika 12. -. Slo転eni podaci vizualni analitika raunalo-based alat
(Osobna arhiva)
33. Slika 13., - Prvo vizualizacija the ljudski
Protein-Protein-janteraction struktura
34. topolo邸ki DM
Applying topolo邸kih tehnike za DM KDD je vrue i obeavajue budunosti podruje
istra転ivanja.
Topologija ima svoje korijene jan teoretski matematike, a within zadnji desetljee,
raunalna topologija brzo stjee zanimanje meu raunalnih znanstvenika.
To je prouavanje apstraktnih oblika i prostora i preslikavanja meu njima. Nastao je iz
prouavanja geometrije i teorija skupova.
Topolo邸kih metode mogu se primijeniti na podatke koje predstavljaju toke oblacima,
odnosno konanih podskupova ndimenzionalnim euklidska prostora.
Ton unos predstavljen s na uzorku od nekog nepoznatog prostora koji se 転eli
rekonstruirati i razumjeti,
Distinguishing izmeu okoline (ule転i邸tenja) dimenzija n, A pravi dimenzija podataka od
primarnog interesa prema razumijevanju unutarnju strukturu podataka.
35. topolo邸ki DM
Geometrijski i topolo邸ki metode su alati omoguujebrzo analizirati vrlo slo転eno podataka
Moderna znanost podataka koristi topolo邸kih metode kako bi prona邸li strukturne znaajke
skupova podataka prije daljnje nadzorom ili bez nadzora analiza
Matematiki formalizam, koji je razvijen za ugradnju geometrijskih i topolo邸kih tehnika, bavi
setovima toka oblak podataka, odnosno konanim skupom toaka
Toke oblaci su konani uzorci uzeti s geometrijskog objekta
Tools iz razliitih grana geometrija i topologija zatim se koriste za studija toka setovi oblak
podataka
Topologija pru転a formalni jezik za kvalitativne matematike, dok je geometrija uglavnom
kvantitativna.
Topology klinacih godina Odnosi blizini ili blizine, jer geometrija mo転e se smatrati prouavanje
funkcija udaljenosti
Ove metode stvoriti sa転etak ili komprimirani zastupljenost sve znaajke podataka kako bi se brzo
otkriti odreene obrasce i odnose u podacima.
Ideja o izgradnji sa転etke cijelog podruja atributa ukljuuje razumijevanje odnosa izmeu
topolo邸kih i geometrijske objekte izgraene iz podataka koje koriste razne mogunosti
36. topolo邸ki DM
Sl. 14.
Oblikovanje raunalne
strukturu (dolje) iz oblika koji
se 転eli rekonstruirati i
razumjeti (iznad)