(1) In-Silico experiments, especially in their refinement cycle, lead to creation of new software tools, algorithms and even computer science challenges. To make this experiment valuable such a process need to be controlled and recorded while achieving milestone stages;
(2) Scientific experiments are performed in cycles, when each cycle is a refinement of hypotheses. Continuing research starting from any cycle and branching this process further on, require that each cycle is checkpointed and stored as a scientific procedure step;
(3) Medical research reliant on data analysis, focused on early disease diagnosis or stopping the disease progress, very often results in providing software tools helping in data analysis and created during the experimentation cycles. To treat the process of knowledge discovery based on data analysis and development of processing tools, as a research method, we need to provide the way of formal description of stages of such a process, be paired with hypothesis refinement stages.
1 of 29
Download to read offline
More Related Content
Diagnostic hypothesis refinement in reproducible workflows for advanced medical data analysis
1. Diagnostic hypothesis refinement
in reproducible workflows
for advanced medical data analysis
Cezary Mazurek, Raul Palma, Juliusz Pukacki
Pozna┰ Supercomputing and Networking Center
Scientific workshop. Big Data: processing and exploration, 22.04.2016, Pozna┰,
2. Workflows
?? The automation of a business process, in whole or part, during which
documents, information or tasks are passed from one participant to
another for action, according to a set of procedural rules.
(From The Workflow Management Coalition Specification)
?? Workflows serve a dual function *):
C? first as detailed documentation of the method (i. e. the input sources
and processing steps taken for the derivation of a certain data item)
C? second as re-usable, executable artifacts for data-intensive analysis.
?? Workflows stitch together a variety of data manipulation activities such
as data movement, data transformation or data visualization to serve
the goals of the scientific study*).
*) D.Garijo,P.Alper,K.Belhajjame,O.Corcho,Y.Gil,C.Goble,Common motifs in scientific workflows: an empirical analysis,
Future Gener. Comput. Syst.(2014) http://dx.doi.org/10.1016/j.future.2013.09.018.
4. Research objects
?? Semantic aggregations of related scientific resources,
their annotations and research context.
?? Enable referring a bundle of research artifacts supporting
an investigation
?? Provide mechanisms to associate human and machine-
readable metadata to these artifacts.
?? RO model enables to capture and describe these
objects, their provenance and lifecycle
C? Ontology network (based on OAI-ORE, OA, PROV-O)
5. ROHub (http://www.rohub.org)
?? Enables the sharing of scientific
findings
?? Support scientists throughout the
research lifecycle to create and
maintain high-quality ROs that can
be interpreted and reproduced in
the future.
?? Combination of digital libraries,
long term-preservation and
semantic technologies.
RO storage, lifecycle management and preservation
6. ROHub (http://www.rohub.org)
?? Create, manage and share ROs: different methods for creating
ROs and different access modes to share them
?? Finding ROs: a faceted search interface, a keyword search
box, and other interfaces as the collab spheres can be plugged.
?? Assessing RO quality: a progress bar of the RO quality based
on set of predefined basic RO requirements. Detailed quality
information
?? Managing RO evolution: create RO snapshots at any point in
time, release and preserve the RO when the research has
concluded. Visualize the evolution of the RO
?? RO Inspection: Navigation panel to traverse the RO content
?? External resources and workflow run: aggregate any type of
resource, including links to external resources and RO
bundles (ZIP serialization)
?? Monitoring ROs: monitoring features, such as fixity checking
and RO quality, which generate notifications when changes
are detected. Visualize those notifications and subscribe via
atom feed.
RO storage, lifecycle management and preservation
7. Reproducibility
Reproducibility for computational experiments is challenging.
It is hard both for authors to derive a compendium that
encapsulates all the components (e.g., data, code, parameter
settings, environment) needed to reproduce a result, and for
reviewers to verify the results.
There are also other barriers, from practical issues C including the use of
proprietary data, software and specialized hardware, to social C for example, the
lack of incentives for authors to spend the extra time making their experiments
reproducible.
Challenge
10. Problem/Challenge
?? Historically, the scientific method is well known and was
introduced by Louis Pasteur in XIX century.
?? This method is in fact a cycle of following steps:
C? Observations->Questions->Hypotheses-> Predictions>Experiment
(incl. refinement) -> Discussion.
?? These steps allowed for many years to report scientific
experiments conducted In-Vivo and In-Vitro.
?? However we think that even if steps are still the same while
performing in-Silico experiments, the way of reporting them
need to be changed, especially in fields where part of
experiment is creation of software tools
11. What it means?
?? Smart data processing and experients but´.
?? What data means for doctors?
?? They need treatment instructions and its expected
results
?? We need new environment for in-silico disease
hypothesis refinement and building decision
support systems
This is a challenge for researchers
in interdisciplinary teams
14. Towards precision (personal) medicine
?? Questions-driven (smart) data experiments
?? If failed C lead to other questions and experiments
?? So´do not start from transfering existing knowledge and statistical
approach to data space
?? We need to start thinking from like being lived in data space and create
experiments to quickly verify hypothesis (diagnostic hypothesis
refinement)
?? Precision medicine makes it even more challenging!!!
C? Data experiments are being defined for individual patient and route
to personal treatment
15. Disruptive Innovation in Interdisciplinary Teams
Decision
?support
?systems
?for
?disease
?diagnosis
?
Diagnos%c
?hypothesis
?
?re?nement
?
Smart
?processing
?
Data
?
17. Hypothesis refinement
?? In-Silico experiments, especially in their refinement cycle, lead to creation of new
software tools, algorithms and even computer science challenges. To make this
experiment valuable such a process needs to be controlled and recorded while
achieving milestone stages;
?? Scientific experiments are performed in cycles, when each cycle is a refinement of the
hypothesis. Continuing research starting from any cycle and branching this process
further on, require that each cycle is checkpointed and stored as a scientific procedure
step;
?? Medical research reliant on data analysis, focused on early disease diagnosis or stopping
the disease progress, very often results in providing software tools helping in data
analysis and created during the experimentation cycles.
?? To treat the process of knowledge discovery based on data analysis and development of
processing tools, as a research method, we need to provide the way of formal description
of stages of such a process, be paired with hypothesis refinement stages.
Practical cases
18. Domain examples
?? Bioinformatics
C? *omics research
?? Earth Science (EVEREST)
C? European Virtual Environment for
Research - Earth Science
Themes: a solution
?? Cardiac rehabilitation and early
risk identification of cardiovasular
diseases
C? Personal prevention plan
?? Glaucoma diagnosis and early
prevention
19. Glaucoma research experiment
Glaucoma - group of progressive optic nerve neuropaties releted with:
a) accelerated apoptosis of Retinal Ganglion Cells due to neurotrophic deprivation
[Band L.R., 2009; Balaratnasingam C., 2008; Fechtner R.D., Weinreb R.N., 1994; Garcia-
Valenzuela E., 1995; Quigley H.A., 1976, 1995, 2000; Yablonski M., Asamoto A., 1993]
b)?lamina cribrosa sclerae pathognomonic phenotype changes
[Ernest J.T. and Potts A.M., 1968; Quigley H.A., 1983; Roberts M.D., 2009].
26. GEN Exclusive
We need new models for
collaboration between the health
research industry and academia.
The only way that will happen is if
we can reduce some of the local
competition and fragmentation and
create super-centers of innovation
for:
?? regional consortia for clinical
research,
?? experimental therapeutics
centers,
?? advanced biomanufacturing
centers,
?? centralized repositories for patient
data.
hpp://leadership.je?erson.edu/blog/
?
28. Publications
?? R.Wasilewicz, P.Wasilewicz, E.Czaplicka, J.KocieckiI, J.Blaszczynski, C.Mazurek and R.Slowinski: 24 hour continuous ocular
tonography Triggerfish and biorhythms of the cardiovascular system functional parameters in healthy and glaucoma populations.
Acta Ophthalmologica, 91: 0. doi: 10.1111/j.1755-3768.2013.2721.x
?? Palma R., Corcho O., Ho?ubowicz P., P└rez S., Page K., Mazurek C., Digital libraries for the preservation of research methods and
associated artefacts. Proc. 1st International Workshop on the Digital Preservation of Research Methods and Artefacts (DPRMA
2013) at Joint Conference on Digital Libraries (JCDL 2013). pp. 8-15. Indianapolis, Indiana, USA, July 2013
?? Mazurek, C., Pukacki, J., Kosiedowski, M., Trocha, S., Darbari, H., Saxena, A., Joshi, R., Brenner, P., Gesing, S., Nabrzyski, J.,
Sullivan, M., Dubhashi, D., Thankaswamy, S., and Srivastava, A. (2014) Federated Clouds for Biomedical Research: Integrating
OpenStack for ICTBioMed. Cloud Networking (CloudNet), 2014 IEEE 3rd International Conference on, pp.294-299, 8-10 Oct.
2014, doi: 10.1109/CloudNet.2014.6969011
?? Palma R., Corcho O., G┏mez-P└rez J.M., Mazurek, C., ^ROHub A Digital Library of Research Objects Supporting Scientists
Towards Reproducible Science ̄. In Semantic Publishing Challenge of Proc. Extended Semantic Web Conference (ESWC), Crete,
Greece, May 25-29, 2014.
?? M.Krysinski, M.Krystek, C.Mazurek, J.Pukacki, P.Spychala, M.Stroinski, J.Weglarz. Semantic Data Sharing and Presentation in
Integrated Knowledge System. [In:] R. Bembenik, ?. Skonieczny, H. Rybi┰ski, M. Kryszkiewicz, & M. Niezg┏dka (Eds.), Intelligent
Tools for Building a Scientific Information Platform: Advanced Architectures and Solutions, pp. 67C83. Springer International
Publishing 2013
?? J.Andersen, P.Shah, K.Korski, M.Ibbs, V.Filas, M.Kosiedowski, J.Pukacki, C.Mazurek, Y.Wu, E.Chang, C.Toniatti, G.Draetta,
M.Wiznerowicz: Applying TCGA data for breast cancer diagnostics and pathway analysis, Cancer Research 10/2014; 74(19
Supplement):4272-4272
?? J.Pukacki, H.?wierczy┰ski, C.Mazurek, M.Kosiedowski "RNA-Seq data analysis pipeline in Poznan Supercomputing and
Networking Center", 1st Congress of the Polish Biochemistry, Cell Biology, Biophysics and Bioinformatics, September 2014,
Warsaw, Poland
?? M. Kosiedowski, C. Mazurek, K. S?owi┰ski, M. Stroi┰ski, K. Szyma┰ski, J. W?glarz: ?Telemedical systems for the support of
regional Healthcare In the area of trauma ̄ , Global Telemedicine and Health Updates: Knowledge Resources, vol. 3 str. 592 C
596, 2010
29. Pozna┰ Supercomputing and Networking Center
ul. Noskowskiego 12/14, 61-704 Pozna┰, POLAND,
Office: phone center: (+48 61) 858-20-00, fax: (+48 61) 852-59-54,
e-mail: office@man.poznan.pl, http://www.psnc.pl
affiliated to the Institute of Bioorganic Chemistry of the Polish Academy of Sciences,