You are on page 1of 4

The general problem here is how to restructure a document instance (which can be seen as a

piece of date) from a given generic structure (Type) to another: from a source Type to a Target
one.

The paper presents a Types modelling formalism and shows some of Types characteristics in
it. These characteristics, used in Types comparison process, help the determination of the
transformation operations. A formal analysis of the transformations is also given.

Two kinds of transformations are considered here: static and dynamic.

Static transformations can be Type evolution or Type conversion. Type evolution corresponds
to a change in the generic structure of a given Type and the need to render the data of that
Type conforms to the new structure. Type conversion is a generalisation of Type evolution
since the source and the target Types can be different, contrary to Type evolution where the
target Type is a just an evolution of the source one.

Dynamic transformations are those transformations that take place when editing data (e.g.,
cut/copy/paste operations). In fact, in order to paste a cut/copied data from a given element to
a different one (i.e., the elements don’t have the same structure), the data should be
transformed from the source Type to the target one so that the operation will success (*).

I think that we can be concerned in the project by both the static and the dynamic
transformations

A Type is considered here as a Tree whose nodes (components) are other elementary or
constructed Types (each component has a unique name). Elementary Types can be seen like
elementary data types in programming languages (e.g., PCDATA). The constructed Types are
obtained by combining the elementary Types with some constructors which are syntactic units
clarifying for a given Type its components, their number and the ordering relationship
between them (e.g., ordered group, unordered group, list, choice, identity). The semantics of a
constructed Type is then given by the constructor (relation between its components).

A Type (Tree) is characterized by the structural position of each of its components (given by
the tree representation). It’s also characterized by some of nodes’ attributes represented by
functions (These functions represent an extension of the tree model in order to have a best
comparison of Types, besides the comparison of their names). These functions help in Types
comparing processes. Some examples of functions are: that returns the constructor of a Type,
or that returning the children of a Type, etc.

There are many possible changes that may occur on a Type (node). An example of a Type
(node) evolution is the extension/restriction of the number of components of a Type which is
described by a given Aggregation constructor (ordered group constructor, unordered group
constructor). It can also be an augmentation/restriction of the number of components of a list
(list constructor), etc. These changes, obtained by the Types comparison process based on the
Types’ functions, are then transformed on some transformation rules.

The generation of these transformation rules is made by a comparator. The comparator takes
as input a source Type (tree), a target Type (tree) and probably a set of couples (x,y) called
evolution couples. x and y are Types (nodes) names respectively in the source and the target
Types (tree). The evolution couple (x,y) means that y represents the name of the Type to
which the Type x will be compared. The evolution couples can also be obtained automatically
by the comparator. The output of the comparator is a set of transformation rules that will be
used to transform documents instances (data) from the source Type to the target one. This
work can be done automatically by a component named a convertor.

If the evolution couples are automatically obtained from the source and the target Types by
the comparator, there is a supposition that the Types identifiers are unique (in this case, two
identical identifiers constitute an evolution couple: the homonymy principle). This
supposition may be reasonable if it is about Type evolution, but it’s not in the case of Type
conversion where the Types may be conceived by different persons (**). In fact, the user has
generally no deep knowledge on the Types structures (He has no knowledge in the case of an
novice user (End-user programming)).

The limitation of the mechanisms based on the homonymy principle for the production of
transformation rules (many potential transformations can be missed, e.g., two types with the
same structure, but with different names) motivates to extend the proposed model with
structure and content considerations.

From the structural point of view, given a set T of Types, the interest of the structural
extension of the mechanisms based on the homonymy principle is to set an order relation
between the elements of T. Another set of transformation rules will thus be produced, besides
those produced with the homonymy principle. Another motivation of the Types’ structure
comparison is to decide if a given transformation is possible based only on the Types, without
comparing the instances. The authors introduced the equivalence (tree isomorphism) relation
between types and propose to construct an equivalence class from the Types of the same
structure (The equivalence classes are based on structural identity). This structural identity
can also be relaxed with structural compatibility. This fact allows increasing the number of
allowed Types conversions. There is also the introduction of the order relation (subtyping
(subtree)). A Type t is a subType of t’ means that the content of t can be integrated in t’. There
is also the notion of a cluster which is a kind of an indirect subtree (the Type t is not literally
founded in the t’, but scattered in the tree with respecting some hierarchical conditions). This
notion allows also the definition of some transformations that would not be allowed by the
homonymy principle.

Two types can have no of the relation quoted above, i.e., homonymy relation, equivalence or
order relation. The mechanisms serving to calculate the transformation rules don’t success in
that case. The authors propose for that case a grammar-based solution in order to locate the
subtrees to transform. This solution is a possibility to do Type transformations when the
transformations based on the structure fail. A language is associated to Types and is used in
order to do some transformations. This proposition is used especially for dynamic
transformations (e.g., cut/copy/paste). The preservation of all of the content of the source
document is important in dynamic transformations.

The idea is to consider a data Type as a grammar generating a language. The Type structure
transformation is made by the comparison of the languages of the source and the target Types.
Actually, it is about the comparison of the Type’s instance word (which is a word of the
language generated by its Type grammar) with the words of target Type language. The
existence of one correspondence is enough to conclude the transformation feasibility and then
the preservation of the source instance content.
A Type is a grammar G = <T, N, P, A>
T: is a set of basic Types (Terminal alphabet).
N: is a set of constructed Types (Non-terminal alphabet). T ∩ N = ∅
P: a set of production rules. P is a subset of N × (T ∪ N)*
A: The axiom of the grammar, it corresponds to the name of the Type.

For every constructed Type, a set of production rules is generated (e.g., for a Type t with a
constructor Choice, n (number of alternatives) productions rules t --> alternative (i) are
generated (i= 1 .. n). Here, the axiom of the grammar is the name of the Type, i.e., t.

The language associated to a Type t is noted LG(t) = w ∈ T* t -*-> w. A word w is then a
finite concatenation of the elements of T obtained from the leaves of the Type tree. The word
of a given instance Type is obtained by going through the tree in a prefix way and by writing
the elements of the leaves from left to right.

In order to take into account the structure of the trees representing Types, a set of meta-
characters M is added to the grammar alphabet. These characters represent the constructors.
E.g., aggregation with order: the elements are between {}, aggregation without order: the
elements are between↑↓, choice: the elements are between [], list: the elements are between
(). All the elements in the constructors are separated by “+”.

The words w constructed by the new alphabet (T∪N∪M) take into account the structure of a
Type or a Type instance.

In a dynamic transformation, the source instance corresponds to an expression in the language


called effective Imprint, which is the word obtained by a prefix going through its tree.
Another imprint, called generic, is also constructed for the destination Type. The generic
imprint of a Type represents the family of all the instances a Type can generate. The
comparison of the effective imprint of the source instance to the generic imprint of the
destination Type allows the detection of the necessary transformations for a piece of data in a
cut/copy/paste operation for example. More generally, the comparison of the source and
destination Types’ generic imprints allows the identification of some structural relations
between them (equivalence, factor (a word w is a factor of w’ if w’ = uwu’), etc).

I think that this work is very interesting; the problem tackled here is very close to what we
want to do in the project. A thing which can simplify our work is that all the constructors used
in the documents Types can be found in XML DTD, which is very used nowadays as a Types
model representation.

I thought a little bit about the rewriting approach (rewriting concepts using terminology) in
order to classify Types, I think that this approach can be used as high level reasoning
mechanism that helps to query and find Types based on some formal concepts annotations.
Why in an annotation level? Because I think that the tree formalism is very interesting to
capture many Types characteristics of DTDs and then transformations rules generation (Does
the fact of representing a Type as a concept maintain the possibility to represent all these
characteristics).

Finally, the words of the grammar representing the source and the target Types can also be
compared in order classify them according to a syntactic (structural) similarity. This similarity
measure can be used to find the closest Type to a new one.
This is just to clarify what I understand; it may help to start the interaction and idea exchange.

(*) I think that some kind of interaction may be useful here in order to guide the user in the
Type transformation process. I refer to what Boualem named “progressive help”.

(**) The problem can be resolved if we suppose that there is a given domain ontology. We can
also think about semantic annotation of the Types’ names with some formal concepts defined
in the domain ontology. These concepts will be used then to reason about the semantics of the
labels (even systemically different) of the Types’ names.

You might also like