Professional Documents
Culture Documents
Ming Jiang Tat-Sang Choy Sameep Mehta Matt Coatney Steve Barr
Kaden Hazzard David Richie Srinivasan Parthasarathy Raghu Machiraju
David Thompson John Wilkins Boyd Gatlin
Figure 1: The figure shows the two feature min- This approach first identifies individual points as
ing paradigms being discussed here. The point- belonging to a feature and then aggregates them to form
classification paradigm is depicted on the left branch, contiguous regions. Although the points are the entities
and the aggregate classification paradigm is depicted that are classified, it is the aggregated point sets that
on the right branch. are identified as features. The points are obtained from
a tour of the discrete domain and may be the vertices of
the mesh or sites of atoms used for the simulation. The
ROIs can be represented as a list of volumetric elements
these highly discriminating feature detection algorithms (for CFD) or a point cloud (for MD).
to the particular application far outweigh any loss of
generality. 3.3 Aggregate Classification Paradigm
Since physically-based feature detection algorithms Identification of other types of features may require in-
tend to be application specific, we defer discussion of formation that is less localized than the neighborhood of
these techniques until a later section. We now describe a point. Further, we may need to classify a set of points
two distinct feature mining paradigms. The common based on their collective behavior. We term this sec-
thread is that both are bottom-up feature constructions ond feature mining paradigm aggregate classification.
with underlying physics-based criteria. These two A vortex is an example of a feature in fluid dynamics
paradigms are used depending on the local and global for which this approach is appropriate. Point classifi-
nature of the features. As will become more evident cation techniques can also be used to locate vortices;
from the examples, it is unlikely that non-physics-based however, without the verification step, false positives
techniques can provide the fidelity needed to locate can pervade the analysis [40] and invariably cause er-
complex flow field structures. Below we provide general roneous conclusions to be drawn or meaningless models
methodologies that can locate features across a vareity to be built.
of domains. Figure 1 shows the various stages of the Aggregate classification follows a somewhat differ-
two paradigms. ent sequence of operations than point classification:
j+1
Figure 4: Two spatially separated defect clusters (black
and grey) marked by local binary classification, followed j
k+1
by aggregation code. k
j1 k1
i1 i i+1
We have empirically validated that this method
works well even on noisy data. Figure 3(left) shows E
a persistent structure at 1000 K. The atoms marked Swirl Plane
examine the immediate neighborhood of a grid point for measured with respect to the entire core region, not
the existence of those flow patterns. Figure 5 illustrates just individual points within the core region.
the detection algorithm in three dimensions, where it is As a demonstration of our algorithm, we ap-
necessary to approximate the vortex core tangent vector plied the aggregate classification paradigm to the
first, and then project the neighboring velocity vectors blunt fin/flat plate flow field as shown in Figure 6. In
onto the swirl plane normal to it, before applying Figure 6(a) we show the detected vortex core regions
the above procedure to the projected velocity vectors. and the the verifying streamlines near the fin/plate in-
Techniques for approximating the vortex core tangent tersection. In Figure 6(b) (top left) we show a close-
vector are based on local velocity gradient tensors, such up of the primary vortex located adjacent to the flat
as the vorticity vector or the real eigenvector. plate. Notice that even though the vortex cross sec-
Our approach segments candidate vortex core re- tion is highly elliptical, the tangent vectors projected
gions by aggregating points identified from the detec- into the local swirling plane, gray vectors in Figure 6(b)
tion step. We then classify (or verify) these candidate (bottom), still satisfy the 2 swirling criterion.
core regions based on the existence of swirling stream-
lines surrounding them. For features that lack a for- 4.2.2 MD Example: Defect Evolution
mal definition, such as the vortex, we must choose the The large number of identified structures must be
verification criterion so that it concurs with the intu- sorted into a smaller set of distinct types. Quenching
itive understanding of the feature. In this case, verify- solves this problem but is computationally expensive.
ing whether a candidate core region is an actual vor- In addition, some structures are stable only at high
tex core requires checking for swirling streamlines sur- temperatures. By quenching, these structures are lost.
rounding it. Checking for swirling flow in three dimen- Identifying time averaged structures is a great challenge.
sions is a non-trivial problem since vortices can bend Occasionally with noisy data, too many or too few
and twist in various ways. Our verification algorithm atoms are marked. Figure 7 illustrates this situation.
measures three-dimensional swirling in terms of the dif- These two structures have different numbers of defect
ferential geometry properties of the streamline. The atoms marked, yet when quenched are the same. The
technique we developed essentially checks to see if the application of the aggregate classification paradigm for
local tangent to the streamline, when projected onto mining MD data is still under development.
the swirl plane normal to the candidate core tangent
vector, spans a measure of 2. The aggregate nature of 4.3 Shape Characterization
this classification step is apparent. Checking for swirling Having extracted features either as vortices (CFD) or
streamlines is a global (or aggregate) approach to fea- defects (MD), we describe the next step: characteriz-
ture classification (or verification) because swirling is ing them using shapes attributes. Our objective is to
ordinate graphs, where atoms are nodes and chemical twist is that substructures of interest are matched us-
bonds are edges. Two substructures are considered ing recursive fuzzy hashing, a technique that allows us
equal if, after an arbitrary number of spatial transla- to handle noise-effects in the data [26]. The memory
tions (and/or rotations) on one substructure, both sub- utilization is O(km) where m < k.
structures are described by the same graph. This set of interesting substructures is used to
Defects are represented simply by atom locations, generate a substructural fingerprint for a given de-
(i.e., their (x, y, z) coordinates). In order to match de- fect. A substructure fingerprint for a particular de-
fect structures, one approach is to store complete three- fect is a vector representation of the set of interest-
dimensional information between all pairs of atoms ing substructures; elements representing substructures
(mining bonds) belonging to a defect. The mining bond contained in that defect are marked, either with a 1
M (Ai Aj ) is a 3-tuple of the form (present) or a 0 (absent) [26]. Defect disambiguation
can then be conducted by comparing the corresponding
M (Ai Aj ) = {Ai type, Aj type, distance(Ai Aj )} fingerprint vectors. It turns out that this simple ap-
proach is robust to noise as well as efficient to compute.
A k-atoms et X, which is a substructure containing k Moreover, as we can see from Table 10, the disambigua-
connected atoms, is then defined as a tuple of the form tion is near perfect (non-diagonal elements are 0, diag-
onal elements are 1) for a set of 14 defect types across
X = {SX , A1 , A2 , . . . , Ak },
multiple simulation runs (0 indicating perfect dissimi-
where Ai is the ith atom and SX is the set of mining larity and 1 representing high similarity).
bonds describing the atomset. By defining atom pair This approach clearly lends itself to identifying dis-
combinations with mining bonds, the 3D graph is criminating motifs, (i.e., substructures that can disam-
completely represented in a form, such that two atom biguate between different defects). Moreover these mo-
sets X and Y are considered to be the same chemical tifs are likely to be much smaller than the defect struc-
substructure if SX = SY . To compare two defects we tures enabling one to develop more efficient defect classi-
simply need to compare the corresponding X tuples fiers, based on the presence or absence of these discrim-
of the corresponding atomsets represented by the two inatory motifs d-motifs. Essentially, instead of generat-
defects. The problem with this naive approach is that ing all the motifs for a given defect and then construct-
it is not very robust to noise (missing and/or perturbed ing a fingerprint, we would only need to check for the
defect atoms) and the memory requirements are O(k 2 ) presence or absence of these d-motifs. The robustness
where k represents the number of atoms in the defect of this strategy is currently being evaluated.
structure.
As an alternative to the above approach, we decom- 5 Summary
pose each defect structure into k spherical regions, each The steady increase in computing power perennially
one centered around every atom belonging to the defect. challenges our ability to learn new science from the
The substructures formed by all those atoms within the massive amount of data that are being generated. Our
ROI are represented as using mining bonds. The key science applications, computational fluid dynamics and
molecular dynamics, raise central problems that we plan [5] M. C. Burl, L. Asker, P. Smyth, U. M. Fayyad,
to address using a common framework which we have P. Perona, L. Crumpler, and J. Aubele. Learning to
christened generalized feature mining. Components of Recognize Volcanoes on Venus. Machine Learning,
this approach include temporal event detection, local 30(2-3):165194, 1998.
and global feature mining paradigms, and kinematical [6] M. Coatney, S. Mehta, T.-S. Choy, S. Barr,
S. Parthasarathy, R. Machiraju, and J. W. Wilkins.
and dynamical interactions discovery.
Defect Detection in Silicon and Alloys. In IEEE Work-
Our science applications are very distinct and at-
shop on Visualization in Bioinformatics and Chemin-
tempt to characterize very diverse phenomena. How- formatics, October 2002.
ever, they both have commonalities that are exploited [7] L. Dehaspe, H. Toivonen, and R. D. King. Finding
by the above set of techniques. We reported some ongo- Frequent Substructures in Chemical Compounds. In
ing research in designing feature mining paradigms for 4th ACM SIGKDD Intl. Conference on Knowledge
large-scale simulation datasets. The results obtained Discovery and Data Mining, pages 3036, August 1998.
have been very encouraging. A systematic approach [8] S. Djoko, D. Cook, and L. Holder. Analyzing the Ben-
to feature mining was conceived to locate both local efits of Domain Knowledge in Substructure Discovery.
and global features. However, much more remains to In 1st ACM SIGKDD Intl. Conference on Knowledge
be done to realize the complete unified framework men- Discovery and Data Mining, pages 7580, August 1995.
[9] A. Fitzgibbon, M. Pilu, and R. B. Fisher. Direct Least
tioned above. Our next main task will be tracking the
Square Fitting of Ellipses. IEEE Trans. on Pattern
characterized features by exploiting the similarities in
Analysis and Machine Intelligence, 21(5):476480, May
the extracted geometrical and dynamical attributes to 1999.
address the correspondence problem between features at [10] A. Globus, C. Levit, and T. Lansinski. A Tool
different time steps. Finally, we plan to find associations for Visualizing the Topology of Three-Dimensional
between features and events to gain further scientific in- Vector Fields. In IEEE Visualization 91, pages 33
sights. 39, October 1991.
[11] E.-H. Han, G. Karypis, and V. Kumar. Data Mining
References for Turbulent Flows. In R. L. Grossman, C. Kamath,
P. Kegelmeyer, V. Kumar, and R. R. Namburu, edi-
[1] J. D. Anderson. Fundamentals of Aerodynamics. tors, Data Mining for Scientific and Engineering Ap-
McGraw-Hill Book Company, 1984. plications, pages 239256, 2001.
[2] D. C. Banks and B. A. Singer. A Predictor-Corrector [12] M. Henle. A Combinatorial Introduction to Topology.
Technique for Visualizing Unsteady Flow. IEEE Trans. Dover, 1979.
on Visualization and Computer Graphics, 1(2):151 [13] M. Jiang, R. Machiraju, and D. Thompson. A Novel
163, June 1995. Approach to Vortex Core Region Detection. In Joint
[3] D. Bauer and R. Peikert. Vortex Tracking in Scale- EurographicsIEEE TCVG Symposium on Visualiza-
Space. In Joint EurographicsIEEE TCVG Symposium tion, pages 217225, May 2002.
on Visualization, pages 140147, May 2002. [14] M. Jiang, R. Machiraju, and D. Thompson. Geometric
[4] C. H. Berdahl and D. S. Thompson. Eduction of Verification of Swirling Features in Flow Fields. In
Swirling Structure using the Velocity Gradient Tensor. IEEE Visualization 02, pages 307314, October 2002.
AIAA J., 31(1):97103, January 1993.