Most data?centric software must deal with some form of import/export of their internal data model to an external data format. In many cases, this external data format is some sort of standard format, or otherwise dictated by external sources, and does not map one?-on-?one to the internal data model. This was also the case for CareConnect, HealthConnect/Corilus¡® latest Electronic Medical Record software. CareConnect must be able to import/export its data from/to SUMEHR, PMF, and GPSMF documents. On top of this comes that CareConnect¡¯s internal data model consists of some 300 classes, which means there are a lot of mappings to define. To deal with the size and complexity of this scenario, we decided to use a specialised language: the ATL transformation language in combination with the EMF Transformation Virtual Machine: EMFTVM is a new runtime for ATL, which adds a number of performance?enhancing features that make it more suitable for use within a Java application. The declarative, rule?based nature of ATL allowed us to write more concise code as well as distribute the workload of writing the ATL transformation code over multiple developers. This significantly increased our ability to deal with complexity.
1 of 24
Download to read offline
More Related Content
Using ATL/EMFTVM for import/export of medical data - #sda2014
8. Why ATL?
(ATL Transformation Language)
Domain-specific
language for
transformation
More expressive than
mapping frameworks
embedded in Java, e.g. Dozer
Less verbose for
transformations than general-purpose
languages
Uses OCL standard for
expressions
Uses EMF for data
representation
Closely related to plain Java
objects
Enriched with additional
concepts, e.g. containment
and associated properties
8
9. Why EMFTVM?
(EMF Transformation Virtual Machine)
Enhanced for ¡°online¡±
use (performance)
Reuse pre-loaded
transformations for
multiple executions
JIT compiler translates to
Java bytecode
Adaptive matching
algorithm adds
performance
Improved modularity
Supports multiple rule
inheritance across
different modules
Supports module import
across different source
languages
9
23. Conclusion
We tackled a complex and common programming scenario such as import/export
by breaking it up in three ways:
23
Use specialised
language for
translating
between
domain model
and pivot model
Use pivot model
for import/
export => only
support a single
import/export
format
Use regular Java to
handle file I/O and
database interaction
XML
#4: Most data-centric software must deal with some form of import/export of their internal data model to an external data format. In many cases, this external data format is some sort of standard format, or otherwise dictated by external sources, and does not map one-on-one to the internal data model.
#5: This was also the case for?CareConnect,?HealthConnect/Corilus¡® latest?Electronic Medical Recordsoftware.?CareConnect?must be able to import/export its data from/to?SUMEHR, PMF, and?GPSMF?documents. On top of this comes that?CareConnect¡¯s?internal data model consists of some 300 classes, which means there are a lot of mappings to define.
#11: EMFTVM performance is roughly 80% better than the default ATL EMF-specific VM. EMFTVM has a JIT-compiler that improves performance of complex code blocks. It also allows for reuse of a pre-loaded VM instance (when invoking from Java), which is useful when invoking the same transformation on different models many times over. Finally, it uses an?adaptive rule matching algorithm?that configures itself against the metamodels and transformation modules used on the first run of the VM. The EcoreUtil.Copier entry is the standard Java implementation for copying Ecore models, and forms the baseline ("it doesn't get faster than this"). On the following lines one can see the evolution in performance of the various ATL VMs.
#17: Note that the?MoDisco-EMiFy-EMF?round-trip scenario for generating the Ecore model and EMF reflective methods was performed for each change to our domain model, and takes about 10 minutes each time. Most of this time was taken up by running?MoDisco?and the EMF code generator. This takes up extra time from the developer who manages the domain model changes, besides the regular Java code changes and SQL migration code. The positive side of this picture is that the Ecore domain model can also be used reflectively to do dry-run transformations outside of the application codebase. This allows the ATL developers to test each change in isolation.
#18: ATL is a mapping language, which applies its mapping rules top-down for each model element it can find, and uses a two-pass compiler approach to ¡°weave¡± the elements generated by different rules back together. This frees the programmer from dealing with model traversal and rule execution order: this is taken care of by ATL. Instead, the programmer can focus on defining the mapping between specific input and output elements. To give you an idea, here¡¯s what the ATL rule for mapping lab results?looks like.
#19: Helpers can be used to encapsulate complex navigations. Helper attributes also improve performance, because their values are cached. They function as a sort of query index over the input data.
#20: The references LabResultEntry instances from the LabResult rule are transformed here. This rule takes care of title entries...
#21: ...and this rule takes care of the actual lab result values.
#22: The implicit tracing mechanism has allowed us to distribute the transformation writing over three to four developers. One of these programmers was trained in .NET instead of Java, which turned out not to be a barrier for writing ATL code. Because we also chose to separate our transformation code from our file I/O and database interaction code, another developer could work on the file I/O and database interaction in Java. Finally, the Corilus XML conversion service for SUMEHR, PMF, and GPSMF was taken care of by a separate development team at Corilus HQ, and was based on existing code. The critical path in the development pipeline was therefore made up by the ATL code, which is exactly the kind of workload that can easily be dispersed over multiple developers.