You are on page 1of 88

Elysium Technologies Private Limited

Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com

13 Years of Experience Automated Services 24/7 Help Desk Support Experience & Expertise Developers Advanced Technologies & Tools Legitimate Member of all Journals Having 1,50,000 Successive records in all Languages More than 12 Branches in Tamilnadu, Kerala & Karnataka. Ticketing & Appointment Systems. Individual Care for every Student. Around 250 Developers & 20 Researchers

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com

227-230 Church Road, Anna Nagar, Madurai 625020. 0452-4390702, 4392702, + 91-9944793398. info@elysiumtechnologies.com, elysiumtechnologies@gmail.com

S.P.Towers, No.81 Valluvar Kottam High Road, Nungambakkam, Chennai - 600034. 044-42072702, +91-9600354638, chennai@elysiumtechnologies.com

15, III Floor, SI Towers, Melapudur main Road, Trichy 620001. 0431-4002234, + 91-9790464324. trichy@elysiumtechnologies.com

577/4, DB Road, RS Puram, Opp to KFC, Coimbatore 641002 0422- 4377758, +91-9677751577. coimbatore@elysiumtechnologies.com

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com

1st Floor, A.R.IT Park, Rasi Color Scan Building, Ramanathapuram - 623501. 04567-223225, +919677704922.ramnad@elysiumtechnologies.com

Plot No: 4, C Colony, P&T Extension, Perumal puram, Tirunelveli627007. 0462-2532104, +919677733255, tirunelveli@elysiumtechnologies.com

74, 2nd floor, K.V.K Complex,Upstairs Krishna Sweets, Mettur Road, Opp. Bus stand, Erode-638 011. 0424-4030055, +919677748477 erode@elysiumtechnologies.com

No: 88, First Floor, S.V.Patel Salai, Pondicherry 605 001. 0413 4200640 +91-9677704822 pondy@elysiumtechnologies.com

TNHB A-Block, D.no.10, Opp: Hotel Ganesh Near Busstand. Salem 636007, 0427-4042220, +91-9894444716. salem@elysiumtechnologies.com

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Local Edge Preserving Multiscale Decomposition For High Dynamic Range Image DIP-001 Tone Mapping A novel filter is proposed for edge-preserving decomposition of an image. It is different from previous filters in its locally adaptive property. The filtered image contains local means everywhere and preserves local salient edges. Comparisons are made between our filtered result and the results of three other methods. A detailed analysis is also made on the behavior of the filter. A multiscale decomposition with this filter is proposed for manipulating a high dynamic range image, which has three detail layers and one base layer. The multiscale decomposition with the filter addresses three assumptions: 1) the base layer preserves local means everywhere; 2) every scale's salient edges are relatively large gradients in a local window; and 3) all of the nonzero gradient information belongs to the detail layer. An effective function is also proposed for compressing the detail layers. The reproduced image gives a good visualization. Experimental results on real images demonstrate that our algorithm is especially effective at preserving or enhancing local details. ETPL Multistucture Large Deformation Diffeomorphic Brain Registration( Biomedical DIP-002 Engineering) Whole brain MRI registration has many useful applications in group analysis and morphometry, yet accurate registration across different neuropathological groups remains challenging. Structure-specific information, or anatomical guidance, can be used to initialize and constrain registration to improve accuracy and robustness. We describe here a multistructure diffeomorphic registration approach that uses concurrent subcortical and cortical shape matching to guide the overall registration. Validation experiments carried out on openly available datasets demonstrate comparable or improved alignment of subcortical and cortical brain structures over leading brain registration algorithms. We also demonstrate that a group-wise average atlas built with multistructure registration accounts for greater intersubject variability and provides more sensitive tensor-based morphometry measurements. ETPL Iterative Closest Normal Point for 3D Face Recognition( Pattern Analysis and Machine DIP-003 Intelligence) The common approach for 3D face recognition is to register a probe face to each of the gallery faces and then calculate the sum of the distances between their points. This approach is computationally expensive and sensitive to facial expression variation. In this paper, we introduce the iterative closest normal point method for finding the corresponding points between a generic reference face and every input face. The proposed correspondence finding method samples a set of points for each face, denoted as the closest normal points. These points are effectively aligned across all faces, enabling effective application of discriminant analysis methods for 3D face recognition. As a result, the expression variation problem is addressed by minimizing the within-class variability of the face samples while maximizing the betweenclass variability. As an important conclusion, we show that the surface normal vectors of the face at the sampled points contain more discriminatory information than the coordinates of the points. We have performed comprehensive experiments on the Face Recognition Grand Challenge database, which is presently the largest available 3D face database. We have achieved verification rates of 99.6 and 99.2 percent at a false acceptance rate of 0.1 percent for the all versus all and ROC III experiments, respectively, which, to the best of our knowledge, have seven and four times less error rates, respectively, compared to the best existing methods on this database. ETPL Face Recognition & verification using photometric stergo(Information Forensics and DIP-004 Security) This paper presents a new database suitable for both 2-D and 3-D face recognition based on photometric stereo (PS): the Photoface database. The database was collected using a custom-made four-source PS device designed to enable data capture with minimal interaction necessary from the subjects. The device, which automatically detects the presence of a subject using ultrasound, was placed at the entrance to a

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
busy workplace and captured 1839 sessions of face images with natural pose and expression. This meant that the acquired data is more realistic for everyday use than existing databases and is, therefore, an invaluable test bed for state-of-the-art recognition algorithms. The paper also presents experiments of various face recognition and verification algorithms using the albedo, surface normals, and recovered depth maps. Finally, we have conducted experiments in order to demonstrate how different methods in the pipeline of PS (i.e., normal field computation and depth map reconstruction) affect recognition and verification performance. These experiments help to 1) demonstrate the usefulness of PS, and our device in particular, for minimal-interaction face recognition, and 2) highlight the optimal reconstruction and recognition algorithms for use with natural-expression PS data. The database can be downloaded from http://www.uwe.ac.uk/research/Photoface. ETPL Objective Quality Assessment of Tone-Mapped Images DIP-005 Tone-mapping operators (TMOs) that convert high dynamic range (HDR) to low dynamic range (LDR) images provide practically useful tools for the visualization of HDR images on standard LDR displays. Different TMOs create different tone-mapped images, and a natural question is which one has the best quality. Without an appropriate quality measure, different TMOs cannot be compared, and further improvement is directionless. Subjective rating may be a reliable evaluation method, but it is expensive and time consuming, and more importantly, is difficult to be embedded into optimization frameworks. Here we propose an objective quality assessment algorithm for tone-mapped images by combining: 1) a multiscale signal fidelity measure on the basis of a modified structural similarity index and 2) a naturalness measure on the basis of intensity statistics of natural images. Validations using independent subject-rated image databases show good correlations between subjective ranking score and the proposed tone-mapped image quality index (TMQI). Furthermore, we demonstrate the extended applications of TMQI using two examples - parameter tuning for TMOs and adaptive fusion of multiple tone-mapped images. ETPL Segmentation and Tracing of Single Neurons from 3D Confocal Microscope Images( DIP-006 Biomedical and Health Informatics) In order to understand the brain, we need to first understand the morphology of neurons. In the neurobiology community, there have been recent pushes to analyze both neuron connectivity and the influence of structure on function. Currently, a technical roadblock that stands in the way of these studies is the inability to automatically trace neuronal structure from microscopy. On the image processing side, proposed tracing algorithms face difficulties in low contrast, indistinct boundaries, clutter, and complex branching structure. To tackle these difficulties, we develop Tree2Tree, a robust automatic neuron segmentation and morphology generation algorithm. Tree2Tree uses a local medial tree generation strategy in combination with a global tree linking to build a maximum likelihood global tree. Recasting the neuron tracing problem in a graph-theoretic context enables Tree2Tree to estimate bifurcations naturally, which is currently a challenge for current neuron tracing algorithms. Tests on cluttered confocal microscopy images of Drosophila neurons give results that correspond to ground truth within a margin of $ pm hbox{2.75}$% normalized mean absolute error. ETPL Silhoutte Analysis-Based action recognition via Exploiting Human Poses( Circuits and DIP-007 Systems for Video Technology) In this paper, we propose a novel scheme for human action recognition that combines the advantages of both local and global representations. We explore human silhouettes for human action representation by taking into account the correlation between sequential poses in an action. A modified bag-of-words model, named bag of correlated poses, is introduced to encode temporally local features of actions. To

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
utilize the property of visual word ambiguity, we adopt the soft assignment strategy to reduce the dimensionality of our model and circumvent the penalty of computational complexity and quantization error. To compensate for the loss of structural information, we propose an extended motion template, i.e., extensions of the motion history image, to capture the holistic structural features. The proposed scheme takes advantages of local and global features and, therefore, provides a discriminative representation for human actions. Experimental results prove the viability of the complimentary properties of two descriptors and the proposed approach outperforms the state-of-the-art methods on the IXMAS action recognition dataset. ETPL Pose-Invariant Face Recognition Using Markov Random Fields DIP-008 One of the key challenges for current face recognition techniques is how to handle pose variations between the probe and gallery face images. In this paper, we present a method for reconstructing the virtual frontal view from a given nonfrontal face image using Markov random fields (MRFs) and an efficient variant of the belief propagation algorithm. In the proposed approach, the input face image is divided into a grid of overlapping patches, and a globally optimal set of local warps is estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed efficiently in the Fourier domain using an extension of the Lucas-Kanade algorithm that can handle illumination variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial landmarks or head pose estimation. In order to improve the performance of our pose normalization method in face recognition, we also present an algorithm for classifying whether a given face image is at a frontal or nonfrontal pose. Experimental results on different datasets are presented to demonstrate the effectiveness of the proposed approach ETPL Color Video Denoising Based on Combined Interframe and Intercolor Prediction( DIP-009 Circuits and Systems for Video Technology) An advanced color video denoising scheme which we call CIFIC based on combined interframe and intercolor prediction is proposed in this paper. CIFIC performs the denoising filtering in the RGB color space, and exploits both the interframe and intercolor correlation in color video signal directly by forming multiple predictors for each color component using all three color components in the current frame as well as the motion-compensated neighboring reference frames. The temporal correspondence is established through the joint-RGB motion estimation (ME) which acquires a single motion trajectory for the red, green, and blue components. Then the current noisy observation as well as the interframe and intercolor predictors are combined by a linear minimum mean squared error (LMMSE) filter to obtain the denoised estimate for every color component. The ill condition in the weight determination of the LMMSE filter is detected and remedied by gradually removing the least contributing predictor. Furthermore, our previous work on the LMMSE filter applied in the adaptive luminance-chrominance space (LAYUV for short) is revisited. By reformulating LAYUV and comparing it with CIFIC, we deduce that LAYUV is a restricted version of CIFIC, and thus CIFIC can theoretically achieve lower denoising error. Experimental results verify the improvement brought by the joint-RGB ME and the integration of the intercolor prediction, as well as the superiority of CIFIC over LAYUV. Meanwhile, when compared with other state-of-the-art algorithms, CIFIC provides competitive performance both in terms of the color peak signal-to-noise ratio and in perceptual quality.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Wang-Landau Monte Carlo-Based Tracking Methods for Abrupt Motions( Pattern DIP-010 Analysis and Machine Intelligence) We propose a novel tracking algorithm based on the Wang-Landau Monte Carlo (WLMC) sampling method for dealing with abrupt motions efficiently. Abrupt motions cause conventional tracking methods to fail because they violate the motion smoothness constraint. To address this problem, we introduce the Wang-Landau sampling method and integrate it into a Markov Chain Monte Carlo (MCMC)-based tracking framework. By employing the novel density-of-states term estimated by the Wang-Landau sampling method into the acceptance ratio of MCMC, our WLMC-based tracking method alleviates the motion smoothness constraint and robustly tracks the abrupt motions. Meanwhile, the marginal likelihood term of the acceptance ratio preserves the accuracy in tracking smooth motions. The method is then extended to obtain good performance in terms of scalability, even on a high-dimensional state space. Hence, it covers drastic changes in not only position but also scale of a target. To achieve this, we modify our method by combining it with the N-fold way algorithm and present the N-Fold Wang-Landau (NFWL)-based tracking method. The N-fold way algorithm helps estimate the density-of-states with a smaller number of samples. Experimental results demonstrate that our approach efficiently samples the states of the target, even in a whole state space, without loss of time, and tracks the target accurately and robustly when position and scale are changing severely ETPL Multi-View ML Object Tracking With Online Learning on Riemannian Manifolds by DIP-011 Combining Geometric Constraints This paper addresses issues in object tracking with occlusion scenarios, where multiple uncalibrated cameras with overlapping fields of view are exploited. We propose a novel method where tracking is first done independently in each individual view and then tracking results are mapped from different views to improve the tracking jointly. The proposed tracker uses the assumptions that objects are visible in at least one view and move uprightly on a common planar ground that may induce a homography relation between views. A method for online learning of object appearances on Riemannian manifolds is also introduced. The main novelties of the paper include: 1) define a similarity measure, based on geodesics between a candidate object and a set of mapped references from multiple views on a Riemannian manifold; 2) propose multi-view maximum likelihood estimation of object bounding box parameters, based on Gaussian-distributed geodesics on the manifold; 3) introduce online learning of object appearances on the manifold, taking into account of possible occlusions; 4) utilize projective transformations for objects between views, where parameters are estimated from warped vertical axis by combining planar homography, epipolar geometry, and vertical vanishing point; 5) embed single-view trackers in a three-layer multi-view tracking scheme. Experiments have been conducted on videos from multiple uncalibrated cameras, where objects contain long-term partial/full occlusions, or frequent intersections. Comparisons have been made with three existing methods, where the performance is evaluated both qualitatively and quantitatively. Results have shown the effectiveness of the proposed method in terms of robustness against tracking drift caused by occlusions. ETPL Multi-Atlas Segmentation with Joint Label Fusion ( Pattern Analysis and Machine DIP-012 Intelligence) Multi-atlas segmentation is an effective approach for automatically labeling objects of interest in biomedical images. In this approach, multiple expert-segmented example images, called atlases, are registered to a target image, and deformed atlas segmentations are combined using label fusion. Among the proposed label fusion strategies, weighted voting with spatially varying weight distributions derived from atlas-target intensity similarity have been particularly successful. However, one limitation of these strategies is that the weights are computed independently for each atlas, without taking into account the fact that different atlases may produce similar label errors. To address this limitation, we propose a new solution for the label fusion problem in which weighted voting is formulated in terms of minimizing the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
total expectation of labeling error and in which pairwise dependency between atlases is explicitly modeled as the joint probability of two atlases making a segmentation error at a voxel. This probability is approximated using intensity similarity between a pair of atlases and the target image in the neighborhood of each voxel. We validate our method in two medical image segmentation problems: hippocampus segmentation and hippocampus subfield segmentation in magnetic resonance (MR) images. For both problems, we show consistent and significant improvement over label fusion strategies that assign atlas weights independently. ETPL Spatially Coherent Fuzzy Clustering for Accurate and Noise-Robust Image DIP-013 Segmentation In this letter, we present a new FCM-based method for spatially coherent and noise-robust image segmentation. Our contribution is twofold: 1) the spatial information of local image features is integrated into both the similarity measure and the membership function to compensate for the effect of noise; and 2) an anisotropic neighborhood, based on phase congruency features, is introduced to allow more accurate segmentation without image smoothing. The segmentation results, for both synthetic and real images, demonstrate that our method efficiently preserves the homogeneity of the regions and is more robust to noise than related FCM-based methods. ETPL Adaptive Markov Random Fields for Joint Unmixing and Segmentation of DIP-014 Hyperspectral Images Abstract: Linear spectral unmixing is a challenging problem in hyperspectral imaging that consists of decomposing an observed pixel into a linear combination of pure spectra (or endmembers) with their corresponding proportions (or abundances). Endmember extraction algorithms can be employed for recovering the spectral signatures while abundances are estimated using an inversion step. Recent works have shown that exploiting spatial dependencies between image pixels can improve spectral unmixing. Markov random fields (MRF) are classically used to model these spatial correlations and partition the image into multiple classes with homogeneous abundances. This paper proposes to define the MRF sites using similarity regions. These regions are built using a self-complementary area filter that stems from the morphological theory. This kind of filter divides the original image into flat zones where the underlying pixels have the same spectral values. Once the MRF has been clearly established, a hierarchical Bayesian algorithm is proposed to estimate the abundances, the class labels, the noise variance, and the corresponding hyperparameters. A hybrid Gibbs sampler is constructed to generate samples according to the corresponding posterior distribution of the unknown parameters and hyperparameters. Simulations conducted on synthetic and real AVIRIS data demonstrate the good performance of the algorithm. ETPL Depth Estimation of Face Images Using the Nonlinear Least-Squares Model DIP-015 Abstract: In this paper, we propose an efficient algorithm to reconstruct the 3D structure of a human face from one or more of its 2D images with different poses. In our algorithm, the nonlinear least-squares model is first employed to estimate the depth values of facial feature points and the pose of the 2D face image concerned by means of the similarity transform. Furthermore, different optimization schemes are presented with regard to the accuracy levels and the training time required. Our algorithm also embeds the symmetrical property of the human face into the optimization procedure, in order to alleviate the sensitivities arising from changes in pose. In addition, the regularization term, based on linear correlation, is added in the objective function to improve the estimation accuracy of the 3D structure. Further, a model-integration method is proposed to improve the depth-estimation accuracy when multiple nonfrontal-view face images are available. Experimental results on the 2D and 3D databases demonstrate the feasibility and efficiency of the proposed methods.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Local Energy Pattern for Texture Classification Using Self-Adaptive Quantization DIP-016 Thresholds Abstract: Local energy pattern, a statistical histogram-based representation, is proposed for texture classification. First, we use normalized local-oriented energies to generate local feature vectors, which describe the local structures distinctively and are less sensitive to imaging conditions. Then, each local feature vector is quantized by self-adaptive quantization thresholds determined in the learning stage using histogram specification, and the quantized local feature vector is transformed to a number by N-nary coding, which helps to preserve more structure information during vector quantization. Finally, the frequency histogram is used as the representation feature. The performance is benchmarked by material categorization on KTH-TIPS and KTH-TIPS2-a databases. Our method is compared with typical statistical approaches, such as basic image features, local binary pattern (LBP), local ternary pattern, completed LBP, Weber local descriptor, and VZ algorithms (VZ-MR8 and VZ-Joint). The results show that our method is superior to other methods on the KTH-TIPS2-a database, and achieving competitive performance on the KTH-TIPS database. Furthermore, we extend the representation from static image to dynamic texture, and achieve favorable recognition results on the University of California at Los Angeles (UCLA) dynamic texture database. ETPL Perceptual Quality Metric With Internal Generative Mechanism DIP-017 Abstract: Objective image quality assessment (IQA) aims to evaluate image quality consistently with human perception. Most of the existing perceptual IQA metrics cannot accurately represent the degradations from different types of distortion, e.g., existing structural similarity metrics perform well on content-dependent distortions while not as well as peak signal-to-noise ratio (PSNR) on contentindependent distortions. In this paper, we integrate the merits of the existing IQA metrics with the guide of the recently revealed internal generative mechanism (IGM). The IGM indicates that the human visual system actively predicts sensory information and tries to avoid residual uncertainty for image perception and understanding. Inspired by the IGM theory, we adopt an autoregressive prediction algorithm to decompose an input scene into two portions, the predicted portion with the predicted visual content and the disorderly portion with the residual content. Distortions on the predicted portion degrade the primary visual information, and structural similarity procedures are employed to measure its degradation; distortions on the disorderly portion mainly change the uncertain information and the PNSR is employed for it. Finally, according to the noise energy deployment on the two portions, we combine the two evaluation results to acquire the overall quality score. Experimental results on six publicly available databases demonstrate that the proposed metric is comparable with the state-of-the-art quality metrics. ETPL Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A DIP-018 Comparative Study Abstract: Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as visual saliency. Modeling bottom -up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased,

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains. ETPL Local Edge-Preserving Multiscale Decomposition for High Dynamic Range Image DIP-019 Tone Mapping Abstract: A novel filter is proposed for edge-preserving decomposition of an image. It is different from previous filters in its locally adaptive property. The filtered image contains local means everywhere and preserves local salient edges. Comparisons are made between our filtered result and the results of three other methods. A detailed analysis is also made on the behavior of the filter. A multiscale decomposition with this filter is proposed for manipulating a high dynamic range image, which has three detail layers and one base layer. The multiscale decomposition with the filter addresses three assumptions: 1) the base layer preserves local means everywhere; 2) every scale's salient edges are relatively large gradients in a local window; and 3) all of the nonzero gradient information belongs to the detail layer. An effective function is also proposed for compressing the detail layers. The reproduced image gives a good visualization. Experimental results on real images demonstrate that our algorithm is especially effective at preserving or enhancing local details. ETPL LLSURE: Local Linear SURE-Based Edge-Preserving Image Filtering DIP-020 Abstract: In this paper, we propose a novel approach for performing high-quality edge-preserving image filtering. Based on a local linear model and using the principle of Stein's unbiased risk estimate as an estimator for the mean squared error from the noisy image only, we derive a simple explicit image filter which can filter out noise while preserving edges and fine-scale details. Moreover, this filter has a fast and exact linear-time algorithm whose computational complexity is independent of the filtering kernel size; thus, it can be applied to real time image processing tasks. The experimental results demonstrate the effectiveness of the new filter for various computer vision applications, including noise reduction, detail smoothing and enhancement, high dynamic range compression, and flash/no-flash denoising. ETPL Optimal Inversion of the Generalized Anscombe Transformation for Poisson-Gaussian DIP-021 Noise Abstract: Many digital imaging devices operate by successive photon-to-electron, electron-to-voltage, and voltage-to-digit conversions. These processes are subject to various signal-dependent errors, which are typically modeled as Poisson-Gaussian noise. The removal of such noise can be effected indirectly by applying a variance-stabilizing transformation (VST) to the noisy data, denoising the stabilized data with a Gaussian denoising algorithm, and finally applying an inverse VST to the denoised data. The generalized Anscombe transformation (GAT) is often used for variance stabilization, but its unbiased inverse transformation has not been rigorously studied in the past. We introduce the exact unbiased inverse of the GAT and show that it plays an integral part in ensuring accurate denoising results. We demonstrate that this exact inverse leads to state-of-the-art results without any notable increase in the computational complexity compared to the other inverses. We also show that this inverse is optimal in the sense that it can be interpreted as a maximum likelihood inverse. Moreover, we thoroughly analyze the behavior of the proposed inverse, which also enables us to derive a closed-form approximation for it. This paper generalizes our work on the exact unbiased inverse of the Anscombe transformation, which we have presented earlier for the removal of pure Poisson noise.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Blind Separation of Time/Position Varying Mixtures DIP-022 Abstract: We address the challenging open problem of blindly separating time/position varying mixtures, and attempt to separate the sources from such mixtures without having prior information about the sources or the mixing system. Unlike studies concerning instantaneous or convolutive mixtures, we assume that the mixing system (medium) is varying in time/position. Attempts to solve this problem have mostly utilized, so far, online algorithms based on tracking the mixing system by methods previously developed for the instantaneous or convolutive mixtures. In contrast with these attempts, we develop a unified approach in the form of staged sparse component analysis (SSCA). Accordingly, we assume that the sources are either sparse or can be sparsified. In the first stage, we estimate the filters of the mixing system, based on the scatter plot of the sparse mixtures' data, using a proper clustering and curve/surface fitting. In the second stage, the mixing system is inverted, yielding the estimated sources. We use the SSCA approach for solving three types of mixtures: time/position varying instantaneous mixtures, singlepath mixtures, and multipath mixtures. Real-life scenarios and simulated mixtures are used to demonstrate the performance of our approach. ETPL Nonlocal Transform-Domain Filter for Volumetric Data Denoising and Reconstruction DIP-023 Abstract: We present an extension of the BM3D filter to volumetric data. The proposed algorithm, BM4D, implements the grouping and collaborative filtering paradigm, where mutually similar d dimensional patches are stacked together in a (d+1) -dimensional array and jointly filtered in transform domain. While in BM3D the basic data patches are blocks of pixels, in BM4D we utilize cubes of voxels, which are stacked into a 4-D group. The 4-D transform applied on the group simultaneously exploits the local correlation present among voxels in each cube and the nonlocal correlation between the corresponding voxels of different cubes. Thus, the spectrum of the group is highly sparse, leading to very effective separation of signal and noise through coefficient shrinkage. After inverse transformation, we obtain estimates of each grouped cube, which are then adaptively aggregated at their original locations. We evaluate the algorithm on denoising of volumetric data corrupted by Gaussian and Rician noise, as well as on reconstruction of volumetric phantom data with non-zero phase from noisy and incomplete Fourier-domain (k-space) measurements. Experimental results demonstrate the state-of-the-art denoising performance of BM4D, and its effectiveness when exploited as a regularizer in volumetric data reconstruction. ETPL Huber Fractal Image Coding Based on a Fitting Plane DIP-024 Abstract: Recently, there has been significant interest in robust fractal image coding for the purpose of robustness against outliers. However, the known robust fractal coding methods (HFIC and LAD-FIC, etc.) are not optimal, since, besides the high computational cost, they use the corrupted domain block as the independent variable in the robust regression model, which may adversely affect the robust estimator to calculate the fractal parameters (depending on the noise level). This paper presents a Huber fitting plane-based fractal image coding (HFPFIC) method. This method builds Huber fitting planes (HFPs) for the domain and range blocks, respectively, ensuring the use of an uncorrupted independent variable in the robust model. On this basis, a new matching error function is introduced to robustly evaluate the best scaling factor. Meanwhile, a median absolute deviation (MAD) about the median decomposition criterion is proposed to achieve fast adaptive quadtree partitioning for the image corrupted by salt & pepper noise. In order to reduce computational cost, the no-search method is applied to speedup the encoding process. Experimental results show that the proposed HFPFIC can yield superior performance over conventional robust fractal image coding methods in encoding speed and the quality of the restored image. Furthermore, the no-search method can significantly reduce encoding time and achieve less than 2.0 s for

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
the HFPFIC with acceptable image quality degradation. In addition, we show that, combined with the MAD decomposition scheme, the HFP technique used as a robust method can further reduce the encoding time while maintaining image quality. ETPL Demosaicking of Noisy Bayer-Sampled Color Images With Least-Squares LumaDIP-025 Chroma Demultiplexing and Noise Level Estimation Abstract: This paper adapts the least-squares luma-chroma demultiplexing (LSLCD) demosaicking method to noisy Bayer color filter array (CFA) images. A model is presented for the noise in whitebalanced gamma-corrected CFA images. A method to estimate the noise level in each of the red, green, and blue color channels is then developed. Based on the estimated noise parameters, one of a finite set of configurations adapted to a particular level of noise is selected to demosaic the noisy data. The noiseadaptive demosaicking scheme is called LSLCD with noise estimation (LSLCD-NE). Experimental results demonstrate state-of-the-art performance over a wide range of noise levels, with low computational complexity. Many results with several algorithms, noise levels, and images are presented on our companion web site along with software to allow reproduction of our results. ETPL Multiscale Gradients-Based Color Filter Array Interpolation DIP-026 Abstract: Single sensor digital cameras use color filter arrays to capture a subset of the color data at each pixel coordinate. Demosaicing or color filter array (CFA) interpolation is the process of estimating the missing color samples to reconstruct a full color image. In this paper, we propose a demosaicing method that uses multiscale color gradients to adaptively combine color difference estimates from different directions. The proposed solution does not require any thresholds since it does not make any hard decisions, and it is noniterative. Although most suitable for the Bayer CFA pattern, the method can be extended to other mosaic patterns. To demonstrate this, we describe its application to the Lukac CFA pattern. Experimental results show that it outperforms other available demosaicing methods by a clear margin in terms of CPSNR and S-CIELAB measures for both mosaic patterns. ETPL Optimal local dimming for LC image formation with controllable backlighting DIP-027 Abstract: Light emitting diode (LED)-backlit liquid crystal displays (LCDs) hold the promise of improving image quality while reducing the energy consumption with signal-dependent local dimming. However, most existing local dimming algorithms are mostly motivated by simple implementation, and they often lack concern for visual quality. To fully realize the potential of LED-backlit LCDs and reduce the artifacts that often occur in current systems, we propose a novel local dimming technique that can achieve the theoretical highest fidelity of intensity reproduction in either l1 or l2 metrics. Both the exact and fast approximate versions of the optimal local dimming algorithm are proposed. Simulation results demonstrate superior performances of the proposed algorithm in terms of visual quality and power consumption. ETPL Multiscale Bi-Gaussian Filter for Adjacent Curvilinear Structures Detection With DIP-028 Application to Vasculature Images Abstract: The intensity or gray-level derivatives have been widely used in image segmentation and enhancement. Conventional derivative filters often suffer from an undesired merging of adjacent objects because of their intrinsic usage of an inappropriately broad Gaussian kernel; as a result, neighboring structures cannot be properly resolved. To avoid this problem, we propose to replace the low-level Gaussian kernel with a bi-Gaussian function, which allows independent selection of scales in the foreground and background. By selecting a narrow neighborhood for the background with regard to the foreground, the proposed method will reduce interference from adjacent objects simultaneously preserving the ability of intraregion smoothing. Our idea is inspired by a comparative analysis of existing line filters, in which several traditional methods, including the vesselness, gradient flux, and medialness

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
models, are integrated into a uniform framework. The comparison subsequently aids in understanding the principles of different filtering kernels, which is also a contribution of this paper. Based on some axiomatic scale-space assumptions, the full representation of our bi-Gaussian kernel is deduced. The popular -normalization scheme for multiscale integration is extended to the bi-Gaussian operators. Finally, combined with a parameter-free shape estimation scheme, a derivative filter is developed for the typical applications of curvilinear structure detection and vasculature image enhancement. It is verified in experiments using synthetic and real data that the proposed method outperforms several conventional filters in separating closely located objects and being robust to noise. ETPL Visually Lossless Encoding for JPEG2000 DIP-029 Abstract: Due to exponential growth in image sizes, visually lossless coding is increasingly being considered as an alternative to numerically lossless coding, which has limited compression ratios. This paper presents a method of encoding color images in a visually lossless manner using JPEG2000. In order to hide coding artifacts caused by quantization, visibility thresholds (VTs) are measured and used for quantization of subband signals in JPEG2000. The VTs are experimentally determined from statistically modeled quantization distortion, which is based on the distribution of wavelet coefficients and the deadzone quantizer of JPEG2000. The resulting VTs are adjusted for locally changing backgrounds through a visual masking model, and then used to determine the minimum number of coding passes to be included in the final codestream for visually lossless quality under the desired viewing conditions. Codestreams produced by this scheme are fully JPEG2000 Part-I compliant. ETPL Rate-Distortion Analysis of Dead-Zone Plus Uniform Threshold Scalar Quantization DIP-030 and Its ApplicationPart I: Fundamental Theory, Abstract: This paper provides a systematic rate-distortion (R-D) analysis of the dead-zone plus uniform threshold scalar quantization (DZ+UTSQ) with nearly uniform reconstruction quantization (NURQ) for generalized Gaussian distribution (GGD), which consists of two aspects: R-D performance analysis and R-D modeling. In R-D performance analysis, we first derive the preliminary constraint of optimum entropy-constrained DZ+UTSQ/NURQ for GGD, under which the property of the GGD distortion-rate (D-R) function is elucidated. Then for the GGD source of actual transform coefficients, the refined constraint and precise conditions of optimum DZ+UTSQ/NURQ are rigorously deduced in the real coding bit rate range, and efficient DZ+UTSQ/NURQ design criteria are proposed to reasonably simplify the utilization of effective quantizers in practice. In R-D modeling, inspired by R-D performance analysis, the D-R function is first developed, followed by the novel rate-quantization (R-Q) and distortionquantization (D-Q) models derived using analytical and heuristic methods. The D-R, R-Q, and D-Q models form the source model describing the relationship between the rate, distortion, and quantization steps. One application of the proposed source model is the effective two-pass VBR coding algorithm design on an encoder of H.264/AVC reference software, which achieves constant video quality and desirable rate control accuracy. ETPL Rate-Distortion Analysis of Dead-Zone Plus Uniform Threshold Scalar Quantization DIP-031 and Its ApplicationPart II: Two-Pass VBR Coding for H.264/AVC Abstract: In the first part of this paper, we derive a source model describing the relationship between the rate, distortion, and quantization steps of the dead-zone plus uniform threshold scalar quantizers with nearly uniform reconstruction quantizers for generalized Gaussian distribution. This source model consists of rate-quantization, distortion-quantization (D-Q), and distortion-rate (D-R) models. In this part, we first rigorously confirm the accuracy of the proposed source model by comparing the calculated results with the coding data of JM 16.0. Efficient parameter estimation strategies are then developed to better employ this source model in our two-pass rate control method for H.264 variable bit rate coding. Based on our D-Q and D-R models, the proposed method is of high stability, low complexity and is easy

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
to implement. Extensive experiments demonstrate that the proposed method achieves: 1) average peak signal-to-noise ratio variance of only 0.0658 dB, compared to 1.8758 dB of JM 16.0's method, with an average rate control error of 1.95% and 2) significant improvement in smoothing the video quality compared with the latest two-pass rate control method. ETPL Nonrigid Image Registration With Crystal Dislocation Energy DIP-032 Abstract: The goal of nonrigid image registration is to find a suitable transformation such that the transformed moving image becomes similar to the reference image. The image registration problem can also be treated as an optimization problem, which tries to minimize an objective energy function that measures the differences between two involved images. In this paper, we consider image matching as the process of aligning object boundaries in two different images. The registration energy function can be defined based on the total energy associated with the object boundaries. The optimal transformation is obtained by finding the equilibrium state when the total energy is minimized, which indicates the object boundaries find their correspondences and stop deforming. We make an analogy between the above processes with the dislocation system in physics. The object boundaries are viewed as dislocations (line defects) in crystal. Then the well-developed dislocation energy is used to derive the energy assigned to object boundaries in images. The newly derived registration energy function takes the global gradient information of the entire image into consideration, and produces an orientation-dependent and long-range interaction between two images to drive the registration process. This property of interaction endows the new registration framework with both fast convergence rate and high registration accuracy. Moreover, the new energy function can be adapted to realize symmetric diffeomorphic transformation so as to ensure one-to-one matching between subjects. In this paper, the superiority of the new method is theoretically proven, experimentally tested and compared with the state-of-the-art SyN method. Experimental results with 3-D magnetic resonance brain images demonstrate that the proposed method outperforms the compared methods in terms of both registration accuracy and computation time. ETPL Double Shrinking Sparse Dimension Reduction DIP-033 Abstract: Learning tasks such as classification and clustering usually perform better and cost less (time and space) on compressed representations than on the original data. Previous works mainly compress data via dimension reduction. In this paper, we propose double shrinking to compress image data on both dimensionality and cardinality via building either sparse low-dimensional representations or a sparse projection matrix for dimension reduction. We formulate a double shrinking model (DSM) as an l1 regularized variance maximization with constraint ||x||2=1, and develop a double shrinking algorithm (DSA) to optimize DSM. DSA is a path-following algorithm that can build the whole solution path of locally optimal solutions of different sparse levels. Each solution on the path is a warm start for searching the next sparser one. In each iteration of DSA, the direction, the step size, and the Lagrangian multiplier are deduced from the Karush-Kuhn-Tucker conditions. The magnitudes of trivial variables are shrunk and the importances of critical variables are simultaneously augmented along the selected direction with the determined step length. Double shrinking can be applied to manifold learning and feature selections for better interpretation of features, and can be combined with classification and clustering to boost their performance. The experimental results suggest that double shrinking produces efficient and effective data compression. ETPL Reinitialization-Free Level Set Evolution via Reaction Diffusion DIP-034 Abstract: This paper presents a novel reaction-diffusion (RD) method for implicit active contours that is completely free of the costly reinitialization procedure in level set evolution (LSE). A diffusion term is introduced into LSE, resulting in an RD-LSE equation, from which a piecewise constant solution can be

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
derived. In order to obtain a stable numerical solution from the RD-based LSE, we propose a two-step splitting method to iteratively solve the RD-LSE equation, where we first iterate the LSE equation, then solve the diffusion equation. The second step regularizes the level set function obtained in the first step to ensure stability, and thus the complex and costly reinitialization procedure is completely eliminated from LSE. By successfully applying diffusion to LSE, the RD-LSE model is stable by means of the simple finite difference method, which is very easy to implement. The proposed RD method can be generalized to solve the LSE for both variational level set method and partial differential equation-based level set method. The RD-LSE method shows very good performance on boundary antileakage. The extensive and promising experimental results on synthetic and real images validate the effectiveness of the proposed RD-LSE approach. ETPL Track Creation and Deletion Framework for Long-Term Online Multiface Tracking DIP-035 Abstract: To improve visual tracking, a large number of papers study more powerful features, or better cue fusion mechanisms, such as adaptation or contextual models. A complementary approach consists of improving the track management, that is, deciding when to add a target or stop its tracking, for example, in case of failure. This is an essential component for effective multiobject tracking applications, and is often not trivial. Deciding whether or not to stop a track is a compromise between avoiding erroneous early stopping while tracking is fine, and erroneous continuation of tracking when there is an actual failure. This decision process, very rarely addressed in the literature, is difficult due to object detector deficiencies or observation models that are insufficient to describe the full variability of tracked objects and deliver reliable likelihood (tracking) information. This paper addresses the track management issue and presents a real-time online multiface tracking algorithm that effectively deals with the above difficulties. The tracking itself is formulated in a multiobject state-space Bayesian filtering framework solved with Markov Chain Monte Carlo. Within this framework, an explicit probabilistic filtering step decides when to add or remove a target from the tracker, where decisions rely on multiple cues such as face detections, likelihood measures, long-term observations, and track state characteristics. The method has been applied to three challenging data sets of more than 9 h in total, and demonstrate a significant performance increase compared to more traditional approaches (Markov Chain Monte Carlo, reversiblejump Markov Chain Monte Carlo) only relying on head detection and likelihood for track management. ETPL Wavelet Domain Multifractal Analysis for Static and Dynamic Texture Classification DIP-036 Abstract: In this paper, we propose a new texture descriptor for both static and dynamic textures. The new descriptor is built on the wavelet-based spatial-frequency analysis of two complementary wavelet pyramids: standard multiscale and wavelet leader. These wavelet pyramids essentially capture the local texture responses in multiple high-pass channels in a multiscale and multiorientation fashion, in which there exists a strong power-law relationship for natural images. Such a power-law relationship is characterized by the so-called multifractal analysis. In addition, two more techniques, scale normalization and multiorientation image averaging, are introduced to further improve the robustness of the proposed descriptor. Combining these techniques, the proposed descriptor enjoys both high discriminative power and robustness against many environmental changes. We apply the descriptor for classifying both static and dynamic textures. Our method has demonstrated excellent performance in comparison with the stateof-the-art approaches in several public benchmark datasets. ETPL Video Object Tracking in the Compressed Domain Using Spatio-Temporal Markov DIP-037 Random Fields Abstract: Despite the recent progress in both pixel-domain and compressed-domain video object tracking, the need for a tracking framework with both reasonable accuracy and reasonable complexity still exists. This paper presents a method for tracking moving objects in H.264/AVC-compressed video sequences

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
using a spatio-temporal Markov random field (ST-MRF) model. An ST-MRF model naturally integrates the spatial and temporal aspects of the object's motion. Built upon such a model, the proposed method works in the compressed domain and uses only the motion vectors (MVs) and block coding modes from the compressed bitstream to perform tracking. First, the MVs are preprocessed through intracoded block motion approximation and global motion compensation. At each frame, the decision of whether a particular block belongs to the object being tracked is made with the help of the ST-MRF model, which is updated from frame to frame in order to follow the changes in the object's motion. The proposed method is tested on a number of standard sequences, and the results demonstrate its advantages over some of the recent state-of-the-art methods. ETPL Online Object Tracking With Sparse Prototypes DIP-038 Abstract: Online object tracking is a challenging problem as it entails learning an effective model to account for appearance change caused by intrinsic and extrinsic factors. In this paper, we propose a novel online object tracking algorithm with sparse prototypes, which exploits both classic principal component analysis (PCA) algorithms with recent sparse representation schemes for learning effective appearance models. We introduce l1regularization into the PCA reconstruction, and develop a novel algorithm to represent an object by sparse prototypes that account explicitly for data and noise. For tracking, objects are represented by the sparse prototypes learned online with update. In order to reduce tracking drift, we present a method that takes occlusion and motion blur into account rather than simply includes image observations for model update. Both qualitative and quantitative evaluations on challenging image sequences demonstrate that the proposed tracking algorithm performs favorably against several state-ofthe-art methods. ETPL Automatic Dynamic Texture Segmentation Using Local Descriptors and Optical Flow DIP-039 bstract: A dynamic texture (DT) is an extension of the texture to the temporal domain. How to segment a DT is a challenging problem. In this paper, we address the problem of segmenting a DT into disjoint regions. A DT might be different from its spatial mode (i.e., appearance) and/or temporal mode (i.e., motion field). To this end, we develop a framework based on the appearance and motion modes. For the appearance mode, we use a new local spatial texture descriptor to describe the spatial mode of the DT; for the motion mode, we use the optical flow and the local temporal texture descriptor to represent the temporal variations of the DT. In addition, for the optical flow, we use the histogram of oriented optical flow (HOOF) to organize them. To compute the distance between two HOOFs, we develop a simple effective and efficient distance measure based on Weber's law. Furthermore, we also address the problem of threshold selection by proposing a method for determining thresholds for the segmentation method by an offline supervised statistical learning. The experimental results show that our method provides very good segmentation results compared to the state-of-the-art methods in segmenting regions that differ in their dynamics. ETPL Efficient Image Classification via Multiple Rank Regression DIP-040 bstract: The problem of image classification has aroused considerable research interest in the field of image processing. Traditional methods often convert an image to a vector and then use a vector-based classifier. In this paper, a novel multiple rank regression model (MRR) for matrix data classification is proposed. Unlike traditional vector-based methods, we employ multiple-rank left projecting vectors and right projecting vectors to regress each matrix data set to its label for each category. The convergence behavior, initialization, computational complexity, and parameter determination are also analyzed. Compared with vector-based regression methods, MRR achieves higher accuracy and has lower computational complexity. Compared with traditional supervised tensor-based methods, MRR performs

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
better for matrix data classification. Promising experimental results on face, object, and hand-written digit image classification tasks are provided to show the effectiveness of our method. ETPL Regularized Discriminative Spectral Regression Method for Heterogeneous Face DIP-041 Matching Abstract: Face recognition is confronted with situations in which face images are captured in various modalities, such as the visual modality, the near infrared modality, and the sketch modality. This is known as heterogeneous face recognition. To solve this problem, we propose a new method called discriminative spectral regression (DSR). The DSR maps heterogeneous face images into a common discriminative subspace in which robust classification can be achieved. In the proposed method, the subspace learning problem is transformed into a least squares problem. Different mappings should map heterogeneous images from the same class close to each other, while images from different classes should be separated as far as possible. To realize this, we introduce two novel regularization terms, which reflect the category relationships among data, into the least squares approach. Experiments conducted on two heterogeneous face databases validate the superiority of the proposed method over the previous methods. ETPL Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search DIP-042 Abstract: Due to the popularity of social media websites, extensive research efforts have been dedicated to tag-based social image search. Both visual information and tags have been investigated in the research field. However, most existing methods use tags and visual characteristics either separately or sequentially in order to estimate the relevance of images. In this paper, we propose an approach that simultaneously utilizes both visual and textual information to estimate the relevance of user tagged images. The relevance estimation is determined with a hypergraph learning approach. In this method, a social image hypergraph is constructed, where vertices represent images and hyperedges represent visual or textual terms. Learning is achieved with use of a set of pseudo-positive images, where the weights of hyperedges are updated throughout the learning process. In this way, the impact of different tags and visual words can be automatically modulated. Comparative results of the experiments conducted on a dataset including 370+images are presented, which demonstrate the effectiveness of the proposed approach. ETPL Action Search by Example Using Randomized Visual Vocabularies DIP-043 Abstract: Because actions can be small video objects, it is a challenging problem to search for similar actions in crowded and dynamic scenes when a single query example is provided. We propose a fast action search method that can efficiently locate similar actions spatiotemporally. Both the query action and the video datasets are characterized by spatio-temporal interest points. Instead of using a unified visual vocabulary to index all interest points in the database, we propose randomized visual vocabularies to enable fast and robust interest point matching. To accelerate action localization, we have developed a coarse-to-fine video subvolume search scheme, which is several orders of magnitude faster than the existing spatio-temporal branch and bound search. Our experiments on cross-dataset action search show promising results when compared with the state of the arts. Additional experiments on a 5-h versatile video dataset validate the efficiency of our method, where an action search can be finished in just 37.6 s on a regular desktop machine. ETPL Robust Albedo Estimation From a Facial Image With Cast Shadow Under General DIP-044 Unknown Lighting Abstract: Albedo estimation from a facial image is crucial for various computer vision tasks, such as 3-D morphable-model fitting, shape recovery, and illumination-invariant face recognition, but the currently available methods do not give good estimation results. Most methods ignore the influence of cast shadows and require a statistical model to obtain facial albedo. This paper describes a method for albedo estimation that makes combined use of image intensity and facial depth information for an image with

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
cast shadows and general unknown light. In order to estimate the albedo map of a face, we formulate the albedo estimation problem as a linear programming problem that minimizes intensity error under the assumption that the surface of the face has constant albedo. Since the solution thus obtained has significant errors in certain parts of the facial image, the albedo estimate needs to be compensated. We minimize the mean square error of albedo under the assumption that the surface normals, which are calculated from the facial depth information, are corrupted with noise. The proposed method is simple and the experimental results show that this method gives better estimates than other methods. ETPL Separable Markov Random Field Model and Its Applications in Low Level Vision DIP-045 Abstract: This brief proposes a continuously-valued Markov random field (MRF) model with separable filter bank, denoted as MRFSepa, which significantly reduces the computational complexity in the MRF modeling. In this framework, we design a novel gradient-based discriminative learning method to learn the potential functions and separable filter banks. We learn MRFSepa models with 2-D and 3-D separable filter banks for the applications of gray-scale/color image denoising and color image demosaicing. By implementing MRFSepa model on graphics processing unit, we achieve real-time image denoising and fast image demosaicing with high-quality results.

ETPL Two-Direction Nonlocal Model for Image Denoising DIP-046 Abstract: Similarities inherent in natural images have been widely exploited for image denoising and other applications. In fact, if a cluster of similar image patches is rearranged into a matrix, similarities exist both between columns and rows. Using the similarities, we present a two-directional nonlocal (TDNL) variational model for image denoising. The solution of our model consists of three components: one component is a scaled version of the original observed image and the other two components are obtained by utilizing the similarities. Specifically, by using the similarity between columns, we get a nonlocal-means-like estimation of the patch with consideration to all similar patches, while the weights are not the pairwise similarities but a set of clusterwise coefficients. Moreover, by using the similarity between rows, we also get nonlocal-autoregression-like estimations for the center pixels of the similar patches. The TDNL model leads to an alternative minimization algorithm. Experiments indicate that the model can perform on par with or better than the state-of-the-art denoising methods. ETPL Optimizing the Error Diffusion Filter for Blue Noise Halftoning With Multiscale Error DIP-047 Diffusion Abstract: A good halftoning output should bear a blue noise characteristic contributed by isotropicallydistributed isolated dots. Multiscale error diffusion (MED) algorithms try to achieve this by exploiting radially symmetric and noncausal error diffusion filters to guarantee spatial homogeneity. In this brief, an optimized diffusion filter is suggested to make the diffusion close to isotropic. When it is used with MED, the resulting output has a nearly ideal blue noise characteristic. ETPL Sparse Representation With Kernels DIP-049 Abstract: Recent research has shown the initial success of sparse coding (Sc) in solving many computer vision tasks. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which helps in finding a sparse representation of nonlinear features, we propose kernel sparse representation (KSR). Essentially, KSR is a sparse coding technique in a high dimensional feature space mapped by an implicit mapping function. We apply KSR to feature coding in image classification, face recognition, and kernel matrix approximation. More specifically, by incorporating KSR into spatial pyramid matching (SPM), we develop KSRSPM, which achieves a good performance for image classification. Moreover,

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
KSR-based feature coding can be shown as a generalization of efficient match kernel and an extension of Sc-based SPM. We further show that our proposed KSR using a histogram intersection kernel (HIK) can be considered a soft assignment extension of HIK-based feature quantization in the feature coding process. Besides feature coding, comparing with sparse coding, KSR can learn more discriminative sparse codes and achieve higher accuracy for face recognition. Moreover, KSR can also be applied to kernel matrix approximation in large scale learning tasks, and it demonstrates its robustness to kernel matrix approximation, especially when a small fraction of the data is used. Extensive experimental results demonstrate promising results of KSR in image classification, face recognition, and kernel matrix approximation. All these applications prove the effectiveness of KSR in computer vision and machine learning tasks. ETPL Image-Difference Prediction: From Grayscale to Color DIP-050 Abstract: Existing image-difference measures show excellent accuracy in predicting distortions, such as lossy compression, noise, and blur. Their performance on certain other distortions could be improved; one example of this is gamut mapping. This is partly because they either do not interpret chromatic information correctly or they ignore it entirely. We present an image-difference framework that comprises image normalization, feature extraction, and feature combination. Based on this framework, we create image-difference measures by selecting specific implementations for each of the steps. Particular emphasis is placed on using color information to improve the assessment of gamut-mapped images. Our best image-difference measure shows significantly higher prediction accuracy on a gamutmapping dataset than all other evaluated measures. ETPL When Does Computational Imaging Improve Performance? DIP-051 Abstract: A number of computational imaging techniques are introduced to improve image quality by increasing light throughput. These techniques use optical coding to measure a stronger signal level. However, the performance of these techniques is limited by the decoding step, which amplifies noise. Although it is well understood that optical coding can increase performance at low light levels, little is known about the quantitative performance advantage of computational imaging in general settings. In this paper, we derive the performance bounds for various computational imaging techniques. We then discuss the implications of these bounds for several real-world scenarios (e.g., illumination conditions, scene properties, and sensor noise characteristics). Our results show that computational imaging techniques do not provide a significant performance advantage when imaging with illumination that is brighter than typical daylight. These results can be readily used by practitioners to design the most suitable imaging systems given the application at hand. ETPL Anisotropic Interpolation of Sparse Generalized Image Samples DIP-052 Abstract: Practical image-acquisition systems are often modeled as a continuous-domain prefilter followed by an ideal sampler, where generalized samples are obtained after convolution with the impulse response of the device. In this paper, our goal is to interpolate images from a given subset of such samples. We express our solution in the continuous domain, considering consistent resampling as a datafidelity constraint. To make the problem well posed and ensure edge-preserving solutions, we develop an efficient anisotropic regularization approach that is based on an improved version of the edge-enhancing anisotropic diffusion equation. Following variational principles, our reconstruction algorithm minimizes successive quadratic cost functionals. To ensure fast convergence, we solve the corresponding sequence of linear problems by using multigrid iterations that are specifically tailored to their sparse structure. We conduct illustrative experiments and discuss the potential of our approach both in terms of algorithmic

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
design and reconstruction quality. In particular, we present results that use as little as 2% of the image samples. ETPL Clustered-Dot Halftoning With Direct Binary Search DIP-053 Abstract: In this paper, we present a new algorithm for aperiodic clustered-dot halftoning based on direct binary search (DBS). The DBS optimization framework has been modified for designing clustered-dot texture, by using filters with different sizes in the initialization and update steps of the algorithm. Following an intuitive explanation of how the clustered-dot texture results from this modified framework, we derive a closed-form cost metric which, when minimized, equivalently generates stochastic clustereddot texture. An analysis of the cost metric and its influence on the texture quality is presented, which is followed by a modification to the cost metric to reduce computational cost and to make it more suitable for screen design. ETPL Task-Specific Image Partitioning DIP-054 Abstract: Image partitioning is an important preprocessing step for many of the state-of-the-art algorithms used for performing high-level computer vision tasks. Typically, partitioning is conducted without regard to the task in hand. We propose a task-specific image partitioning framework to produce a region-based image representation that will lead to a higher task performance than that reached using any taskoblivious partitioning framework and existing supervised partitioning framework, albeit few in number. The proposed method partitions the image by means of correlation clustering, maximizing a linear discriminant function defined over a superpixel graph. The parameters of the discriminant function that define task-specific similarity/dissimilarity among superpixels are estimated based on structured support vector machine (S-SVM) using task-specific training data. The S-SVM learning leads to a better generalization ability while the construction of the superpixel graph used to define the discriminant function allows a rich set of features to be incorporated to improve discriminability and robustness. We evaluate the learned task-aware partitioning algorithms on three benchmark datasets. Results show that task-aware partitioning leads to better labeling performance than the partitioning computed by the stateof-the-art general-purpose and supervised partitioning algorithms. We believe that the task-specific image partitioning paradigm is widely applicable to improving performance in high-level image understanding tasks ETPL Generalized Inverse-Approach Model for Spectral-Signal Recovery DIP-055 Abstract: We have studied the transformation system of a spectral signal to the response of the system as a linear mapping from higher to lower dimensional space in order to look more closely at inverseapproach models. The problem of spectral-signal recovery from the response of a transformation system is generally stated on the basis of the generalized inverse-approach theorem, which provides a modular model for generating a spectral signal from a given response value. The controlling criteria, including the robustness of the inverse model to perturbations of the response caused by noise, and the condition number for matrix inversion, are proposed, together with the mean square error, so as to create an efficient model for spectral-signal recovery. The spectral-reflectance recovery and color correction of natural surface color are numerically investigated to appraise different illuminant-observer transformation matrices based on the proposed controlling criteria both in the absence and the presence of noise. ETPL Spatio-Temporal Auxiliary Particle Filtering With -Norm-Based Appearance Model DIP-056 Learning for Robust Visual Tracking Abstract: In this paper, we propose an efficient and accurate visual tracker equipped with a new particle filtering algorithm and robust subspace learning-based appearance model. The proposed visual tracker avoids drifting problems caused by abrupt motion changes and severe appearance variations that are well-

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
known difficulties in visual tracking. The proposed algorithm is based on a type of auxiliary particle filtering that uses a spatio-temporal sliding window. Compared to conventional particle filtering algorithms, spatio-temporal auxiliary particle filtering is computationally efficient and successfully implemented in visual tracking. In addition, a real-time robust principal component pursuit (RRPCP) equipped with l1-norm optimization has been utilized to obtain a new appearance model learning block for reliable visual tracking especially for occlusions in object appearance. The overall tracking framework based on the dual ideas is robust against occlusions and out-of-plane motions because of the proposed spatio-temporal filtering and recursive form of RRPCP. The designed tracker has been evaluated using challenging video sequences, and the results confirm the advantage of using this tracker. ETPL Manifold Regularized Multitask Learning for Semi-Supervised Multilabel Image DIP-057 Classification Abstract: It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification. ETPL Linear Distance Coding for Image Classification DIP-058 Abstract: The feature coding-pooling framework is shown to perform well in image classification tasks, because it can generate discriminative and robust image representations. The unavoidable information loss incurred by feature quantization in the coding process and the undesired dependence of pooling on the image spatial layout, however, may severely limit the classification. In this paper, we propose a linear distance coding (LDC) method to capture the discriminative information lost in traditional coding methods while simultaneously alleviating the dependence of pooling on the image spatial layout. The core of the LDC lies in transforming local features of an image into more discriminative distance vectors, where the robust image-to-class distance is employed. These distance vectors are further encoded into sparse codes to capture the salient features of the image. The LDC is theoretically and experimentally shown to be complementary to the traditional coding methods, and thus their combination can achieve higher classification accuracy. We demonstrate the effectiveness of LDC on six data sets, two of each of three types (specific object, scene, and general object), i.e., Flower 102 and PFID 61, Scene 15 and Indoor 67, Caltech 101 and Caltech 256. The results show that our method generally outperforms the traditional coding methods, and achieves or is comparable to the state-of-the-art performance on these data sets. ETPL What Are We Tracking: A Unified Approach of Tracking and Recognition DIP-059 Abstract: Tracking is essentially a matching problem. While traditional tracking methods mostly focus on low-level image correspondences between frames, we argue that high-level semantic correspondences are indispensable to make tracking more reliable. Based on that, a unified approach of low-level object tracking and high-level recognition is proposed for single object tracking, in which the target category is

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
actively recognized during tracking. High-level offline models corresponding to the recognized category are then adaptively selected and combined with low-level online tracking models so as to achieve better tracking performance. Extensive experimental results show that our approach outperforms state-of-the-art online models in many challenging tracking scenarios such as drastic view change, scale change, background clutter, and morphable objects. ETPL Unsupervised Amplitude and Texture Classification of SAR Images With Multinomial DIP-060 Latent Model Abstract: In this paper, we combine amplitude and texture statistics of the synthetic aperture radar images for the purpose of model-based classification. In a finite mixture model, we bring together the Nakagami densities to model the class amplitudes and a 2-D auto-regressive texture model with t-distributed regression error to model the textures of the classes. A non-stationary multinomial logistic latent class label model is used as a mixture density to obtain spatially smooth class segments. The classification expectation-maximization algorithm is performed to estimate the class parameters and to classify the pixels. We resort to integrated classification likelihood criterion to determine the number of classes in the model. We present our results on the classification of the land covers obtained in both supervised and unsupervised cases processing TerraSAR-X, as well as COSMO-SkyMed data. ETPL Fuzzy C-Means Clustering With Local Information and Kernel Metric for Image DIP-061 Segmentation Abstract: In this paper, we present an improved fuzzy C-means (FCM) algorithm for image segmentation by introducing a tradeoff weighted fuzzy factor and a kernel metric. The tradeoff weighted fuzzy factor depends on the space distance of all neighboring pixels and their gray-level difference simultaneously. By using this factor, the new algorithm can accurately estimate the damping extent of neighboring pixels. In order to further enhance its robustness to noise and outliers, we introduce a kernel distance measure to its objective function. The new algorithm adaptively determines the kernel parameter by using a fast bandwidth selection rule based on the distance variance of all data points in the collection. Furthermore, the tradeoff weighted fuzzy factor and the kernel distance measure are both parameter free. Experimental results on synthetic and real images show that the new algorithm is effective and efficient, and is relatively independent of this type of noise. ETPL Rate-Distortion Optimized Rate Control for Depth Map-Based 3-D Video Coding DIP-062 Abstract: In this paper, a novel rate control scheme with optimized bits allocation for the 3-D video coding is proposed. First, we investigate the R-D characteristics of the texture and depth map of the coded view, as well as the quality dependency between the virtual view and the coded view. Second, an optimal bit allocation scheme is developed to allocate target bits for both the texture and depth maps of different views. Meanwhile, a simplified model parameter estimation scheme is adopted to speed up the coding process. Finally, the experimental results on various 3-D video sequences demonstrate that the proposed algorithm achieves excellent R-D efficiency and bit rate accuracy compared to benchmark algorithms. ETPL Performance Evaluation Methodology for Historical Document Image Binarization DIP-063 Abstract: Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms,

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
background noise, character enlargement, and merging. Several experiments conducted in comparison with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme. ETPL Video Quality Pooling Adaptive to Perceptual Distortion Severity DIP-064 Abstract: It is generally recognized that severe video distortions that are transient in space and/or time have a large effect on overall perceived video quality. In order to understand this phenomena, we study the distribution of spatio-temporally local quality scores obtained from several video quality assessment (VQA) algorithms on videos suffering from compression and lossy transmission over communication channels. We propose a content adaptive spatial and temporal pooling strategy based on the observed distribution. Our method adaptively emphasizes worst scores along both the spatial and temporal dimensions of a video sequence and also considers the perceptual effect of large-area cohesive motion flow such as egomotion. We demonstrate the efficacy of the method by testing it using three different VQA algorithms on the LIVE Video Quality database and the EPFL-PoliMI video quality database. ETPL Modified Gradient Search for Level Set Based Image Segmentation DIP-065 Abstract: Level set methods are a popular way to solve the image segmentation problem. The solution contour is found by solving an optimization problem where a cost functional is minimized. Gradient descent methods are often used to solve this optimization problem since they are very easy to implement and applicable to general nonconvex functionals. They are, however, sensitive to local minima and often display slow convergence. Traditionally, cost functionals have been modified to avoid these problems. In this paper, we instead propose using two modified gradient descent methods, one using a momentum term and one based on resilient propagation. These methods are commonly used in the machine learning community. In a series of 2-D/3-D-experiments using real and synthetic data with ground truth, the modifications are shown to reduce the sensitivity for local optima and to increase the convergence rate. The parameter sensitivity is also investigated. The proposed methods are very simple modifications of the basic method, and are directly compatible with any type of level set implementation. Downloadable reference code with examples is available online. ETPL Maximum Margin Correlation Filter: A New Approach for Localization and DIP-066 Classification Abstract: Support vector machine (SVM) classifiers are popular in many computer vision tasks. In most of them, the SVM classifier assumes that the object to be classified is centered in the query image, which might not always be valid, e.g., when locating and classifying a particular class of vehicles in a large scene. In this paper, we introduce a new classifier called Maximum Margin Correlation Filter (MMCF), which, while exhibiting the good generalization capabilities of SVM classifiers, is also capable of localizing objects of interest, thereby avoiding the need for image centering as is usually required in SVM classifiers. In other words, MMCF can simultaneously localize and classify objects of interest. We test the efficacy of the proposed classifier on three different tasks: vehicle recognition, eye localization, and face classification. We demonstrate that MMCF outperforms SVM classifiers as well as well known correlation filters.

ETPL Adaptive Fingerprint Image Enhancement With Emphasis on Preprocessing of Data DIP-067 Abstract: This article proposes several improvements to an adaptive fingerprint enhancement method that is based on contextual filtering. The term adaptive implies that parameters of the method are automatically adjusted based on the input fingerprint image. Five processing blocks comprise the adaptive fingerprint enhancement method, where four of these blocks are updated in our proposed

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
system. Hence, the proposed overall system is novel. The four updated processing blocks are: 1) preprocessing; 2) global analysis; 3) local analysis; and 4) matched filtering. In the preprocessing and local analysis blocks, a nonlinear dynamic range adjustment method is used. In the global analysis and matched filtering blocks, different forms of order statistical filters are applied. These processing blocks yield an improved and new adaptive fingerprint image processing method. The performance of the updated processing blocks is presented in the evaluation part of this paper. The algorithm is evaluated toward the NIST developed NBIS software for fingerprint recognition on FVC databases. ETPL Objective Quality Assessment of Tone-Mapped Images DIP-068 Abstract: Tone-mapping operators (TMOs) that convert high dynamic range (HDR) to low dynamic range (LDR) images provide practically useful tools for the visualization of HDR images on standard LDR displays. Different TMOs create different tone-mapped images, and a natural question is which one has the best quality. Without an appropriate quality measure, different TMOs cannot be compared, and further improvement is directionless. Subjective rating may be a reliable evaluation method, but it is expensive and time consuming, and more importantly, is difficult to be embedded into optimization frameworks. Here we propose an objective quality assessment algorithm for tone-mapped images by combining: 1) a multiscale signal fidelity measure on the basis of a modified structural similarity index and 2) a naturalness measure on the basis of intensity statistics of natural images. Validations using independent subject-rated image databases show good correlations between subjective ranking score and the proposed tone-mapped image quality index (TMQI). Furthermore, we demonstrate the extended applications of TMQI using two examples - parameter tuning for TMOs and adaptive fusion of multiple tone-mapped images. ETPL Catching a Rat by Its Edglets DIP-069 Abstract: Computer vision is a noninvasive method for monitoring laboratory animals. In this article, we propose a robust tracking method that is capable of extracting a rodent from a frame under uncontrolled normal laboratory conditions. The method consists of two steps. First, a sliding window combines three features to coarsely track the animal. Then, it uses the edglets of the rodent to adjust the tracked region to the animal's boundary. The method achieves an average tracking error that is smaller than a representative state-of-the-art method. ETPL Juxtaposed Color Halftoning Relying on Discrete Lines DIP-070 Abstract: Most halftoning techniques allow screen dots to overlap. They rely on the assumption that the inks are transparent, i.e., the inks do not scatter a significant portion of the light back to the air. However, many special effect inks, such as metallic inks, iridescent inks, or pigmented inks, are not transparent. In order to create halftone images, halftone dots formed by such inks should be juxtaposed, i.e., printed side by side. We propose an efficient juxtaposed color halftoning technique for placing any desired number of colorant layers side by side without overlapping. The method uses a monochrome library of screen elements made of discrete lines with rational thicknesses. Discrete line juxtaposed color halftoning is performed efficiently by multiple accesses to the screen element library. ETPL Image Noise Level Estimation by Principal Component Analysis DIP-071 Abstract: The problem of blind noise level estimation arises in many image processing applications, such as denoising, compression, and segmentation. In this paper, we propose a new noise level estimation method on the basis of principal component analysis of image blocks. We show that the noise variance can be estimated as the smallest eigenvalue of the image block covariance matrix. Compared with 13 existing methods, the proposed approach shows a good compromise between speed and accuracy. It is at

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
least 15 times faster than methods with similar accuracy, and it is at least two times more accurate than other methods. Our method does not assume the existence of homogeneous areas in the input image and, hence, can successfully process images containing only textures. ETPL Nonlocal Image Restoration With Bilateral Variance Estimation: A Low-Rank DIP-072 Approach Abstract: Simultaneous sparse coding (SSC) or nonlocal image representation has shown great potential in various low-level vision tasks, leading to several state-of-the-art image restoration techniques, including BM3D and LSSC. However, it still lacks a physically plausible explanation about why SSC is a better model than conventional sparse coding for the class of natural images. Meanwhile, the problem of sparsity optimization, especially when tangled with dictionary learning, is computationally difficult to solve. In this paper, we take a low-rank approach toward SSC and provide a conceptually simple interpretation from a bilateral variance estimation perspective, namely that singular-value decomposition of similar packed patches can be viewed as pooling both local and nonlocal information for estimating signal variances. Such perspective inspires us to develop a new class of image restoration algorithms called spatially adaptive iterative singular-value thresholding (SAIST). For noise data, SAIST generalizes the celebrated BayesShrink from local to nonlocal models; for incomplete data, SAIST extends previous deterministic annealing-based solution to sparsity optimization through incorporating the idea of dictionary learning. In addition to conceptual simplicity and computational efficiency, SAIST has achieved highly competent (often better) objective performance compared to several state-of-the-art methods in image denoising and completion experiments. Our subjective quality results compare favorably with those obtained by existing techniques, especially at high noise levels and with a large amount of missing data. ETPL Variational Approach for the Fusion of Exposure Bracketed Pairs DIP-073 Abstract: When taking pictures of a dark scene with artificial lighting, ambient light is not sufficient for most cameras to obtain both accurate color and detail information. The exposure bracketing feature usually available in many camera models enables the user to obtain a series of pictures taken in rapid succession with different exposure times; the implicit idea is that the user picks the best image from this set. But in many cases, none of these images is good enough; in general, good brightness and color information are retained from longer-exposure settings, whereas sharp details are obtained from shorter ones. In this paper, we propose a variational method for automatically combining an exposure-bracketed pair of images within a single picture that reflects the desired properties of each one. We introduce an energy functional consisting of two terms, one measuring the difference in edge information with the short-exposure image and the other measuring the local color difference with a warped version of the long-exposure image. This method is able to handle camera and subject motion as well as noise, and the results compare favorably with the state of the art. ETPL Image Denoising With Dominant Sets by a Coalitional Game Approach DIP-074 Abstract: Dominant sets are a new graph partition method for pairwise data clustering proposed by Pavan and Pelillo. We address the problem of dominant sets with a coalitional game model, in which each data point is treated as a player and similar data points are encouraged to group together for cooperation. We propose betrayal and hermit rules to describe the cooperative behaviors among the players. After applying the betrayal and hermit rules, an optimal and stable graph partition emerges, and all the players in the partition will not change their groups. For computational feasibility, we design an approximate algorithm for finding a dominant set of mutually similar players and then apply the algorithm to an application such as image denoising. In image denoising, every pixel is treated as a player who seeks similar partners according to its patch appearance in its local neighborhood. By averaging the noisy effects with the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
similar pixels in the dominant sets, we improve nonlocal means image denoising to restore the intrinsic structure of the original images and achieve competitive denoising results with the state-of-the-art methods in visual and quantitative qualities. ETPL High-Order Local Spatial Context Modeling by Spatialized Random Forest DIP-075 Abstract: In this paper, we propose a novel method for spatial context modeling toward boosting visual discriminating power. We are particularly interested in how to model high-order local spatial contexts instead of the intensively studied second-order spatial contexts, i.e., co-occurrence relations. Motivated by the recent success of random forest in learning discriminative visual codebook, we present a spatialized random forest (SRF) approach, which can encode an unlimited length of high-order local spatial contexts. By spatially random neighbor selection and random histogram-bin partition during the tree construction, the SRF can explore much more complicated and informative local spatial patterns in a randomized manner. Owing to the discriminative capability test for the random partition in each tree node's split process, a set of informative high-order local spatial patterns are derived, and new images are then encoded by counting the occurrences of such discriminative local spatial patterns. Extensive comparison experiments on face recognition and object/scene classification clearly demonstrate the superiority of the proposed spatial context modeling method over other state-of-the-art approaches for this purpose. ETPL Adaptive Inpainting Algorithm Based on DCT Induced Wavelet Regularization DIP-076 Abstract: In this paper, we propose an image inpainting optimization model whose objective function is a smoothed 1 norm of the weighted nondecimated discrete cosine transform (DCT) coefficients of the underlying image. By identifying the objective function of the proposed model as a sum of a differentiable term and a nondifferentiable term, we present a basic algorithm inspired by Beck and Teboulle's recent work on the model. Based on this basic algorithm, we propose an automatic way to determine the weights involved in the model and update them in each iteration. The DCT as an orthogonal transform is used in various applications. We view the rows of a DCT matrix as the filters associated with a multiresolution analysis. Nondecimated wavelet transforms with these filters are explored in order to analyze the images to be inpainted. Our numerical experiments verify that under the proposed framework, the filters from a DCT matrix demonstrate promise for the task of image inpainting ETPL Extended Coding and Pooling in the HMAX Model DIP-077 Abstract: This paper presents an extension of the HMAX model, a neural network model for image classification. The HMAX model can be described as a four-level architecture, with the first level consisting of multiscale and multiorientation local filters. We introduce two main contributions to this model. First, we improve the way the local filters at the first level are integrated into more complex filters at the last level, providing a flexible description of object regions and combining local information of multiple scales and orientations. These new filters are discriminative and yet invariant, two key aspects of visual classification. We evaluate their discriminative power and their level of invariance to geometrical transformations on a synthetic image set. Second, we introduce a multiresolution spatial pooling. This pooling encodes both local and global spatial information to produce discriminative image signatures. Classification results are reported on three image data sets: Caltech101, Caltech256, and fifteen scenes. We show significant improvements over previous architectures using a similar framework.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Human Detection in Images via Piecewise Linear Support Vector Machines DIP-078 Abstract: Human detection in images is challenged by the view and posture variation problem. In this paper, we propose a piecewise linear support vector machine (PL-SVM) method to tackle this problem. The motivation is to exploit the piecewise discriminative function to construct a nonlinear classification boundary that can discriminate multiview and multiposture human bodies from the backgrounds in a high-dimensional feature space. A PL-SVM training is designed as an iterative procedure of feature space division and linear SVM training, aiming at the margin maximization of local linear SVMs. Each piecewise SVM model is responsible for a subspace, corresponding to a human cluster of a special view or posture. In the PL-SVM, a cascaded detector is proposed with block orientation features and a histogram of oriented gradient features. Extensive experiments show that compared with several recent SVM methods, our method reaches the state of the art in both detection accuracy and computational efficiency, and it performs best when dealing with low-resolution human regions in clutter backgrounds. ETPL Short Distance Intra Coding Scheme for High Efficiency Video Coding DIP-079 Abstract: This paper proposes a new intra coding scheme, known as short distance intra prediction (SDIP), for high efficiency video coding (HEVC) standardization work. The proposed method is based on the quadtree unit structure of HEVC. By splitting a coding unit into nonsquare units for coding and reconstruction, and therefore shortening the distances between the predicted and the reference samples, the accuracy of intra prediction can be improved when applying the directional prediction method. SDIP improves the intra prediction accuracy, especially for high-detailed regions. This approach is applied in both luma and chroma components. When integrated into the HEVC reference software, it shows up to a 12.8% bit rate reduction to sequences with rich textures. ETPL Probabilistic Graphlet Transfer for Photo Cropping DIP-080 Abstract: As one of the most basic photo manipulation processes, photo cropping is widely used in the printing, graphic design, and photography industries. In this paper, we introduce graphlets (i.e., small connected subgraphs) to represent a photo's aesthetic features, and propose a probabilistic model to transfer aesthetic features from the training photo onto the cropped photo. In particular, by segmenting each photo into a set of regions, we construct a region adjacency graph (RAG) to represent the global aesthetic feature of each photo. Graphlets are then extracted from the RAGs, and these graphlets capture the local aesthetic features of the photos. Finally, we cast photo cropping as a candidate-searching procedure on the basis of a probabilistic model, and infer the parameters of the cropped photos using Gibbs sampling. The proposed method is fully automatic. Subjective evaluations have shown that it is preferred over a number of existing approaches. ETPL On Removing Interpolation and Resampling Artifacts in Rigid Image Registration DIP-081 Abstract: We show that image registration using conventional interpolation and summation approximations of continuous integrals can generally fail because of resampling artifacts. These artifacts negatively affect the accuracy of registration by producing local optima, altering the gradient, shifting the global optimum, and making rigid registration asymmetric. In this paper, after an extensive literature review, we demonstrate the causes of the artifacts by comparing inclusion and avoidance of resampling analytically. We show the sum-of-squared-differences cost function formulated as an integral to be more accurate compared with its traditional sum form in a simple case of image registration. We then discuss aliasing that occurs in rotation, which is due to the fact that an image represented in the Cartesian grid is sampled with different rates in different directions, and propose the use of oscillatory isotropic interpolation kernels, which allow better recovery of true global optima by overcoming this type of

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
aliasing. Through our experiments on brain, fingerprint, and white noise images, we illustrate the superior performance of the integral registration cost function in both the Cartesian and spherical coordinates, and also validate the introduced radial interpolation kernel by demonstrating the improvement in registration. ETPL Fast Positive Deconvolution of Hyperspectral Images DIP-082 Abstract: In this brief, we provide an efficient scheme for performing deconvolution of large hyperspectral images under a positivity constraint, while accounting for spatial and spectral smoothness of the data. ETPL Segmentation of Intracranial Vessels and Aneurysms in Phase Contrast Magnetic DIP-083 Resonance Angiography Using Multirange Filters and Local Variances Abstract: Segmentation of intensity varying and low-contrast structures is an extremely challenging and rewarding task. In computer-aided diagnosis of intracranial aneurysms, segmenting the high-intensity major vessels along with the attached low-contrast aneurysms is essential to the recognition of this lethal vascular disease. It is particularly helpful in performing early and noninvasive diagnosis of intracranial aneurysms using phase contrast magnetic resonance angiographic (PC-MRA) images. The major challenges of developing a PC-MRA-based segmentation method are the significantly varying voxel intensity inside vessels with different flow velocities and the signal loss in the aneurysmal regions where turbulent flows occur. This paper proposes a novel intensity-based algorithm to segment intracranial vessels and the attached aneurysms. The proposed method can handle intensity varying vasculatures and also the low-contrast aneurysmal regions affected by turbulent flows. It is grounded on the use of multirange filters and local variances to extract intensity-based image features for identifying contrast varying vasculatures. The extremely low-intensity region affected by turbulent flows is detected according to the topology of the structure detected by multirange filters and local variances. The proposed method is evaluated using a phantom image volume with an aneurysm and four clinical cases. It achieves 0.80 dice score in the phantom case. In addition, different components of the proposed method-the multirange filters, local variances, and topology-based detection-are evaluated in the comparison between the proposed method and its lower complexity variants. Owing to the analogy between these variants and existing vascular segmentation methods, this comparison also exemplifies the advantage of the proposed method over the existing approaches. It analyzes the weaknesses of these existing approaches and justifies the use of every component involved in the proposed method. It- is shown that the proposed method is capable of segmenting blood vessels and the attached aneurysms on PC-MRA images. ETPL Robust Image Analysis With Sparse Representation on Quantized Visual Features DIP-084 Abstract: Recent techniques based on sparse representation (SR) have demonstrated promising performance in high-level visual recognition, exemplified by the highly accurate face recognition under occlusion and other sparse corruptions. Most research in this area has focused on classification algorithms using raw image pixels, and very few have been proposed to utilize the quantized visual features, such as the popular bag-of-words feature abstraction. In such cases, besides the inherent quantization errors, ambiguity associated with visual word assignment and misdetection of feature points, due to factors such as visual occlusions and noises, constitutes the major cause of dense corruptions of the quantized representation. The dense corruptions can jeopardize the decision process by distorting the patterns of the sparse reconstruction coefficients. In this paper, we aim to eliminate the corruptions and achieve robust image analysis with SR. Toward this goal, we introduce two transfer processes (ambiguity transfer and mis-detection transfer) to account for the two major sources of corruption as discussed. By reasonably assuming the rarity of the two kinds of distortion processes, we augment the original SR-based reconstruction objective with mmbl0-norm regularization on the transfer terms to encourage sparsity and, hence, discourage dense distortion/transfer. Computationally, we relax the nonconvex mmb l0-norm

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
optimization into a convex mmbl1-norm optimization problem, and employ the accelerated proximal gradient method to optimize the convergence provable updating procedure. Extensive experiments on four benchmark datasets, Caltech-101, Caltech-256, Corel-5k, and CMU pose, illumination, and expression, manifest the necessity of removing the quantization corruptions and the various advantages of the proposed framework. ETPL Additive White Gaussian Noise Level Estimation in SVD Domain for Images DIP-085 Abstract: Accurate estimation of Gaussian noise level is of fundamental interest in a wide variety of vision and image processing applications as it is critical to the processing techniques that follow. In this paper, a new effective noise level estimation method is proposed on the basis of the study of singular values of noise-corrupted images. Two novel aspects of this paper address the major challenges in noise estimation: 1) the use of the tail of singular values for noise estimation to alleviate the influence of the signal on the data basis for the noise estimation process and 2) the addition of known noise to estimate the content-dependent parameter, so that the proposed scheme is adaptive to visual signals, thereby enabling a wider application scope of the proposed scheme. The analysis and experiment results demonstrate that the proposed algorithm can reliably infer noise levels and show robust behavior over a wide range of visual content and noise conditions, and that is outperforms relevant existing methods. ETPL Nonedge-Specific Adaptive Scheme for Highly Robust Blind Motion Deblurring of DIP-086 Natural Imagess, Abstract: Blind motion deblurring estimates a sharp image from a motion blurred image without the knowledge of the blur kernel. Although significant progress has been made on tackling this problem, existing methods, when applied to highly diverse natural images, are still far from stable. This paper focuses on the robustness of blind motion deblurring methods toward image diversity-a critical problem that has been previously neglected for years. We classify the existing methods into two schemes and analyze their robustness using an image set consisting of 1.2 million natural images. The first scheme is edge-specific, as it relies on the detection and prediction of large-scale step edges. This scheme is sensitive to the diversity of the image edges in natural images. The second scheme is nonedge-specific and explores various image statistics, such as the prior distributions. This scheme is sensitive to statistical variation over different images. Based on the analysis, we address the robustness by proposing a novel nonedge-specific adaptive scheme (NEAS), which features a new prior that is adaptive to the variety of textures in natural images. By comparing the performance of NEAS against the existing methods on a very large image set, we demonstrate its advance beyond the state-of-the-art. ETPL Image Enhancement Using the Hypothesis Selection Filter: Theory and Application to DIP-087 JPEG Decoding Abstract: We introduce the hypothesis selection filter (HSF) as a new approach for image quality enhancement. We assume that a set of filters has been selected a priori to improve the quality of a distorted image containing regions with different characteristics. At each pixel, HSF uses a locally computed feature vector to predict the relative performance of the filters in estimating the corresponding pixel intensity in the original undistorted image. The prediction result then determines the proportion of each filter used to obtain the final processed output. In this way, the HSF serves as a framework for combining the outputs of a number of different user selected filters, each best suited for a different region of an image. We formulate our scheme in a probabilistic framework where the HSF output is obtained as the Bayesian minimum mean square error estimate of the original image. Maximum likelihood estimates of the model parameters are determined from an offline fully unsupervised training procedure that is derived from the expectation-maximization algorithm. To illustrate how to apply the HSF and to demonstrate its potential, we apply our scheme as a post-processing step to improve the decoding quality of JPEG-encoded document images. The scheme consistently improves the quality of the decoded image

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
over a variety of image content with different characteristics. We show that our scheme results in quantitative improvements over several other state-of-the-art JPEG decoding methods. ETPL Learning the Spherical Harmonic Features for 3-D Face Recognition DIP-088 Abstract: In this paper, a competitive method for 3-D face recognition (FR) using spherical harmonic features (SHF) is proposed. With this solution, 3-D face models are characterized by the energies contained in spherical harmonics with different frequencies, thereby enabling the capture of both gross shape and fine surface details of a 3-D facial surface. This is in clear contrast to most 3-D FR techniques which are either holistic or feature based, using local features extracted from distinctive points. First, 3-D face models are represented in a canonical representation, namely, spherical depth map, by which SHF can be calculated. Then, considering the predictive contribution of each SHF feature, especially in the presence of facial expression and occlusion, feature selection methods are used to improve the predictive performance and provide faster and more cost-effective predictors. Experiments have been carried out on three public 3-D face datasets, SHREC2007, FRGC v2.0, and Bosphorus, with increasing difficulties in terms of facial expression, pose, and occlusion, and which demonstrate the effectiveness of the proposed method. ETPL Video Deblurring Algorithm Using Accurate Blur Kernel Estimation and Residual DIP-089 Deconvolution Based on a Blurred-Unblurred Frame Pair Abstract: Blurred frames may happen sparsely in a video sequence acquired by consumer devices such as digital camcorders and digital cameras. In order to avoid visually annoying artifacts due to those blurred frames, this paper presents a novel motion deblurring algorithm in which a blurred frame can be reconstructed utilizing the high-resolution information of adjacent unblurred frames. First, a motioncompensated predictor for the blurred frame is derived from its neighboring unblurred frame via specific motion estimation. Then, an accurate blur kernel, which is difficult to directly obtain from the blurred frame itself, is computed using both the predictor and the blurred frame. Next, a residual deconvolution is applied to both of those frames in order to reduce the ringing artifacts inherently caused by conventional deconvolution. The blur kernel estimation and deconvolution processes are iteratively performed for the deblurred frame. Simulation results show that the proposed algorithm provides superior deblurring results over conventional deblurring algorithms while preserving details and reducing ringing artifacts ETPL Rank Minimization Code Aperture Design for Spectrally Selective Compressive DIP-090 Imaging Abstract: A new code aperture design framework for multiframe code aperture snapshot spectral imaging (CASSI) system is presented. It aims at the optimization of code aperture sets such that a group of compressive spectral measurements is constructed, each with information from a specific subset of bands. A matrix representation of CASSI is introduced that permits the optimization of spectrally selective code aperture sets. Furthermore, each code aperture set forms a matrix such that rank minimization is used to reduce the number of CASSI shots needed. Conditions for the code apertures are identified such that a restricted isometry property in the CASSI compressive measurements is satisfied with higher probability. Simulations show higher quality of spectral image reconstruction than that attained by systems using Hadamard or random code aperture sets. ETPL Coaching the Exploration and Exploitation in Active Learning for Interactive Video DIP-091 Retrieval Abstract: Conventional active learning approaches for interactive video/image retrieval usually assume the query distribution is unknown, as it is difficult to estimate with only a limited number of labeled instances available. Thus, it is easy to put the system in a dilemma whether to explore the feature space in uncertain areas for a better understanding of the query distribution or to harvest in certain areas for more relevant instances. In this paper, we propose a novel approach called coached active learning that makes

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
the query distribution predictable through training and, therefore, avoids the risk of searching on a completely unknown space. The estimated distribution, which provides a more global view of the feature space, can be used to schedule not only the timing but also the step sizes of the exploration and the exploitation in a principled way. The results of the experiments on a large-scale data set from TRECVID 2005-2009 validate the efficiency and effectiveness of our approach, which demonstrates an encouraging performance when facing domain-shift, outperforms eight conventional active learning methods, and shows superiority to six state-of-the-art interactive video retrieval systems. ETPL Nonnegative Local Coordinate Factorization for Image Representation, DIP-092 Abstract: Recently, nonnegative matrix factorization (NMF) has become increasingly popular for feature extraction in computer vision and pattern recognition. NMF seeks two nonnegative matrices whose product can best approximate the original matrix. The nonnegativity constraints lead to sparse parts-based representations that can be more robust than nonsparse global features. To obtain more accurate control over the sparseness, in this paper, we propose a novel method called nonnegative local coordinate factorization (NLCF) for feature extraction. NLCF adds a local coordinate constraint into the standard NMF objective function. Specifically, we require that the learned basis vectors be as close to the original data points as possible. In this way, each data point can be represented by a linear combination of only a few nearby basis vectors, which naturally leads to sparse representation. Extensive experimental results suggest that the proposed approach provides a better representation and achieves higher accuracy in image clustering. ETPL Flip-Invariant SIFT for Copy and Object Detection DIP-093 Abstract: Scale-invariant feature transform (SIFT) feature has been widely accepted as an effective local keypoint descriptor for its invariance to rotation, scale, and lighting changes in images. However, it is also well known that SIFT, which is derived from directionally sensitive gradient fields, is not flip invariant. In real-world applications, flip or flip-like transformations are commonly observed in images due to artificial flipping, opposite capturing viewpoint, or symmetric patterns of objects. This paper proposes a new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties of SIFT while being tolerant to flips. F-SIFT starts by estimating the dominant curl of a local patch and then geometrically normalizes the patch by flipping before the computation of SIFT. We demonstrate the power of F-SIFT on three tasks: large-scale video copy detection, object recognition, and detection. In copy detection, a framework, which smartly indices the flip properties of F-SIFT for rapid filtering and weak geometric checking, is proposed. F-SIFT not only significantly improves the detection accuracy of SIFT, but also leads to a more than 50% savings in computational cost. In object recognition, we demonstrate the superiority of F-SIFT in dealing with flip transformation by comparing it to seven other descriptors. In object detection, we further show the ability of F-SIFT in describing symmetric objects. Consistent improvement across different kinds of keypoint detectors is observed for F-SIFT over the original SIFT. ETPL Multiscale Image Fusion Using the Undecimated Wavelet Transform With Spectral DIP-094 Factorization and Nonorthogonal Filter Banks Abstract: Multiscale transforms are among the most popular techniques in the field of pixel-level image fusion. However, the fusion performance of these methods often deteriorates for images derived from different sensor modalities. In this paper, we demonstrate that for such images, results can be improved using a novel undecimated wavelet transform (UWT)-based fusion scheme, which splits the image decomposition process into two successive filtering operations using spectral factorization of the analysis filters. The actual fusion takes place after convolution with the first filter pair. Its significantly smaller support size leads to the minimization of the unwanted spreading of coefficient values around

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
overlapping image singularities. This usually complicates the feature selection process and may lead to the introduction of reconstruction errors in the fused image. Moreover, we will show that the nonsubsampled nature of the UWT allows the design of nonorthogonal filter banks, which are more robust to artifacts introduced during fusion, additionally improving the obtained results. The combination of these techniques leads to a fusion framework, which provides clear advantages over traditional multiscale fusion approaches, independent of the underlying fusion rule, and reduces unwanted side effects such as ringing artifacts in the fused reconstruction. ETPL Context-Dependent Logo Matching and Recognition DIP-095 Abstract: We contribute, through this paper, to the design of a novel variational framework able to match and recognize multiple instances of multiple reference logos in image archives. Reference logos and test images are seen as constellations of local features (interest points, regions, etc.) and matched by minimizing an energy function mixing: 1) a fidelity term that measures the quality of feature matching, 2) a neighborhood criterion that captures feature co-occurrence/geometry, and 3) a regularization term that controls the smoothness of the matching solution. We also introduce a detection/recognition procedure and study its theoretical consistency. Finally, we show the validity of our method through extensive experiments on the challenging MICC-Logos dataset. Our method overtakes, by 20%, baseline as well as state-of-the-art matching/recognition procedures. ETPL Efficient Contrast Enhancement Using Adaptive Gamma Correction With Weighting DIP-096 Distribution Abstract: This paper proposes an efficient method to modify histograms and enhance contrast in digital images. Enhancement plays a significant role in digital image processing, computer vision, and pattern recognition. We present an automatic transformation technique that improves the brightness of dimmed images via the gamma correction and probability distribution of luminance pixels. To enhance video, the proposed image-enhancement method uses temporal information regarding the differences between each frame to reduce computational complexity. Experimental results demonstrate that the proposed method produces enhanced images of comparable or higher quality than those produced using previous state-ofthe-art methods. ETPL Binary Compressed Imaging DIP-097 Abstract: Compressed sensing can substantially reduce the number of samples required for conventional signal acquisition at the expense of an additional reconstruction procedure. It also provides robust reconstruction when using quantized measurements, including in the one-bit setting. In this paper, our goal is to design a framework for binary compressed sensing that is adapted to images. Accordingly, we propose an acquisition and reconstruction approach that complies with the high dimensionality of image data and that provides reconstructions of satisfactory visual quality. Our forward model describes data acquisition and follows physical principles. It entails a series of random convolutions performed optically followed by sampling and binary thresholding. The binary samples that are obtained can be either measured or ignored according to predefined functions. Based on these measurements, we then express our reconstruction problem as the minimization of a compound convex cost that enforces the consistency of the solution with the available binary data under total-variation regularization. Finally, we derive an efficient reconstruction algorithm relying on convex-optimization principles. We conduct several experiments on standard images and demonstrate the practical interest of our approach. ETPL MIMO Nonlinear Ultrasonic Tomography by Propagation and Backpropagation DIP-098 Method Abstract: This paper develops a fast ultrasonic tomographic imaging method in a multiple-input multipleoutput (MIMO) configuration using the propagation and backpropagation (PBP) method. By this method,

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ultrasonic excitation signals from multiple sources are transmitted simultaneously to probe the objects immersed in the medium. The scattering signals are recorded by multiple receivers. Utilizing the nonlinear ultrasonic wave propagation equation and the received time domain scattered signals, the objects are to be reconstructed iteratively in three steps. First, the propagation step calculates the predicted acoustic potential data at the receivers using an initial guess. Second, the difference signal between the predicted value and the measured data is calculated. Third, the backpropagation step computes updated acoustical potential data by backpropagating the difference signal to the same medium computationally. Unlike the conventional PBP method for tomographic imaging where each source takes turns to excite the acoustical field until all the sources are used, the developed MIMO-PBP method achieves faster image reconstruction by utilizing multiple source simultaneous excitation. Furthermore, we develop an orthogonal waveform signaling method using a waveform delay scheme to reduce the impact of speckle patterns in the reconstructed images. By numerical experiments we demonstrate that the proposed MIMO-PBP tomographic imaging method results in faster convergence and achieves superior imaging quality. ETPL Vector Extension of Monogenic Wavelets for Geometric Representation of Color DIP-099 Images Abstract: Monogenic wavelets offer a geometric representation of grayscale images through an AM-FM model allowing invariance of coefficients to translations and rotations. The underlying concept of local phase includes a fine contour analysis into a coherent unified framework. Starting from a link with structure tensors, we propose a nontrivial extension of the monogenic framework to vector-valued signals to carry out a nonmarginal color monogenic wavelet transform. We also give a practical study of this new wavelet transform in the contexts of sparse representations and invariant analysis, which helps to understand the physical interpretation of coefficients and validates the interest of our theoretical construction. ETPL Myocardial Motion Estimation From Medical Images Using the Monogenic Signal DIP-100 Abstract: We present a method for the analysis of heart motion from medical images. The algorithm exploits monogenic signal theory, recently introduced as an N-dimensional generalization of the analytic signal. The displacement is computed locally by assuming the conservation of the monogenic phase over time. A local affine displacement model is considered to account for typical heart motions as contraction/expansion and shear. A coarse-to-fine B-spline scheme allows a robust and effective computation of the model's parameters, and a pyramidal refinement scheme helps to handle large motions. Robustness against noise is increased by replacing the standard point-wise computation of the monogenic orientation with a robust least-squares orientation estimate. Given its general formulation, the algorithm is well suited for images from different modalities, in particular for those cases where time variant changes of local intensity invalidate the standard brightness constancy assumption. This paper evaluates the method's feasibility on two emblematic cases: cardiac tagged magnetic resonance and cardiac ultrasound. In order to quantify the performance of the proposed method, we made use of realistic synthetic sequences from both modalities for which the benchmark motion is known. A comparison is presented with state-of-the-art methods for cardiac motion analysis. On the data considered, these conventional approaches are outperformed by the proposed algorithm. A recent global optical-flow estimation algorithm based on the monogenic curvature tensor is also considered in the comparison. With respect to the latter, the proposed framework provides, along with higher accuracy, superior robustness to noise and a considerably shorter computation time.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Revisiting the Relationship Between Adaptive Smoothing and Anisotropic Diffusion DIP-101 With Modified Filters Abstract: Anisotropic diffusion has been known to be closely related to adaptive smoothing and discretized in a similar manner. This paper revisits a fundamental relationship between two approaches. It is shown that adaptive smoothing and anisotropic diffusion have different theoretical backgrounds by exploring their characteristics with the perspective of normalization, evolution step size, and energy flow. Based on this principle, adaptive smoothing is derived from a second order partial differential equation (PDE), not a conventional anisotropic diffusion, via the coupling of Fick's law with a generalized continuity equation where a source or sink exists, which has not been extensively exploited. We show that the source or sink is closely related to the asymmetry of energy flow as well as the normalization term of adaptive smoothing. It enables us to analyze behaviors of adaptive smoothing, such as the maximum principle and stability with a perspective of a PDE. Ultimately, this relationship provides new insights into application-specific filtering algorithm design. By modeling the source or sink in the PDE, we introduce two specific diffusion filters, the robust anisotropic diffusion and the robust coherence enhancing diffusion, as novel instantiations which are more robust against the outliers than the conventional filters. ETPL A Weighted Dictionary Learning Model for Denoising Images Corrupted by Mixed DIP-102 Noise Abstract: This paper proposes a general weighted l2-l0 norms energy minimization model to remove mixed noise such as Gaussian-Gaussian mixture, impulse noise, and Gaussian-impulse noise from the images. The approach is built upon maximum likelihood estimation framework and sparse representations over a trained dictionary. Rather than optimizing the likelihood functional derived from a mixture distribution, we present a new weighting data fidelity function, which has the same minimizer as the original likelihood functional but is much easier to optimize. The weighting function in the model can be determined by the algorithm itself, and it plays a role of noise detection in terms of the different estimated noise parameters. By incorporating the sparse regularization of small image patches, the proposed method can efficiently remove a variety of mixed or single noise while preserving the image textures well. In addition, a modified K-SVD algorithm is designed to address the weighted rank-one approximation. The experimental results demonstrate its better performance compared with some existing methods ETPL Comparative Study of Fixation Density Maps DIP-103 Abstract: Fixation density maps (FDM) created from eye tracking experiments are widely used in image processing applications. The FDM are assumed to be reliable ground truths of human visual attention and as such, one expects a high similarity between FDM created in different laboratories. So far, no studies have analyzed the degree of similarity between FDM from independent laboratories and the related impact on the applications. In this paper, we perform a thorough comparison of FDM from three independently conducted eye tracking experiments. We focus on the effect of presentation time and image content and evaluate the impact of the FDM differences on three applications: visual saliency modeling, image quality assessment, and image retargeting. It is shown that the FDM are very similar and that their impact on the applications is low. The individual experiment comparisons, however, are found to be significantly different, showing that inter-laboratory differences strongly depend on the experimental conditions of the laboratories. The FDM are publicly available to the research community. ETPL Efficient Method for Content Reconstruction With Self-Embedding DIP-104 Abstract: This paper presents a new model of the content reconstruction problem in self-embedding systems, based on an erasure communication channel. We explain why such a model is a good fit for this problem, and how it can be practically implemented with the use of digital fountain codes. The proposed

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
method is based on an alternative approach to spreading the reference information over the whole image, which has recently been shown to be of critical importance in the application at hand. Our paper presents a theoretical analysis of the inherent restoration trade-offs. We analytically derive formulas for the reconstruction success bounds, and validate them experimentally with Monte Carlo simulations and a reference image authentication system. We perform an exhaustive reconstruction quality assessment, where the presented reference scheme is compared to five state-of-the-art alternatives in a common evaluation scenario. Our paper leads to important insights on how self-embedding schemes should be constructed to achieve optimal performance. The reference authentication system designed according to the presented principles allows for high-quality reconstruction, regardless of the amount of the tampered content. The average reconstruction quality, measured on 10000 natural images is 37 dB, and is achievable even when 50% of the image area becomes tampered. ETPL Modeling IrisCode and Its Variants as Convex Polyhedral Cones and Its Security DIP-105 Implications Abstract: IrisCode, developed by Daugman, in 1993, is the most influential iris recognition algorithm. A thorough understanding of IrisCode is essential, because over 100 million persons have been enrolled by this algorithm and many biometric personal identification and template protection methods have been developed based on IrisCode. This paper indicates that a template produced by IrisCode or its variants is a convex polyhedral cone in a hyperspace. Its central ray, being a rough representation of the original biometric signal, can be computed by a simple algorithm, which can often be implemented in one Matlab command line. The central ray is an expected ray and also an optimal ray of an objective function on a group of distributions. This algorithm is derived from geometric properties of a convex polyhedral cone but does not rely on any prior knowledge (e.g., iris images). The experimental results show that biometric templates, including iris and palmprint templates, produced by different recognition methods can be matched through the central rays in their convex polyhedral cones and that templates protected by a method extended from IrisCode can be broken into. These experimental results indicate that, without a thorough security analysis, convex polyhedral cone templates cannot be assumed secure. Additionally, the simplicity of the algorithm implies that even junior hackers without knowledge of advanced image processing and biometric databases can still break into protected templates and reveal relationships among templates produced by different recognition methods. ETPL Correspondence Map-Aided Neighbor Embedding for Image Intra Prediction DIP-106 Abstract: This paper describes new image prediction methods based on neighbor embedding (NE) techniques. Neighbor embedding methods are used here to approximate an input block (the block to be predicted) in the image as a linear combination of K nearest neighbors. However, in order for the decoder to proceed similarly, the K nearest neighbors are found by computing distances between the known pixels in a causal neighborhood (called template) of the input block and the co-located pixels in candidate patches taken from a causal window. Similarly, the weights used for the linear approximation are computed in order to best approximate the template pixels. Although efficient, these methods suffer from limitations when the template and the block to be predicted are not correlated, e.g., in non homogenous texture areas. To cope with these limitations, this paper introduces new image prediction methods based on NE techniques in which the K-NN search is done in two steps and aided, at the decoder, by a block correspondence map, hence the name map-aided neighbor embedding (MANE) method. Another optimized variant of this approach, called oMANE method, is also studied. In these methods, several alternatives have also been proposed for the K-NN search. The resulting prediction methods are shown to bring significant rate-distortion performance improvements when compared to H.264 Intra prediction modes (up to 44.75% rate saving at low bit rates).

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com

ETPL Estimation-Theoretic Approach to Delayed Decoding of Predictively Encoded Video DIP-107 Sequences Abstract: Current video coders employ predictive coding with motion compensation to exploit temporal redundancies in the signal. In particular, blocks along a motion trajectory are modeled as an autoregressive (AR) process, and it is generally assumed that the prediction errors are temporally independent and approximate the innovations of this process. Thus, zero-delay encoding and decoding is considered efficient. This paper is premised on the largely ignored fact that these prediction errors are, in fact, temporally dependent due to quantization effects in the prediction loop. It presents an estimation-theoretic delayed decoding scheme, which exploits information from future frames to improve the reconstruction quality of the current frame. In contrast to the standard decoder that reproduces every block instantaneously once the corresponding quantization indices of residues are available, the proposed delayed decoder efficiently combines all accessible (including any future) information in an appropriately derived probability density function, to obtain the optimal delayed reconstruction per transform coefficient. Experiments demonstrate significant gains over the standard decoder. Requisite information about the source AR model is estimated in a spatio-temporally adaptive manner from a bit-stream conforming to the H.264/AVC standard, i.e., no side information needs to be sent to the decoder in order to employ the proposed approach, thereby compatibility with the standard syntax and existing encoders is retained ETPL Correction of Axial and Lateral Chromatic Aberration With False Color Filtering DIP-108 Abstract: In this paper, we propose a chromatic aberration (CA) correction algorithm based on a false color filtering technique. In general, CA produces color distortions called color fringes near the contrasting edges of captured images, and these distortions cause false color artifacts. In the proposed method, a false color filtering technique is used to filter out the false color components from the chromasignals of the input image. The filtering process is performed with the adaptive weights obtained from both the gradient and color differences, and the weights are designed to reduce the various types of color fringes regardless of the colors of the artifacts. Moreover, as preprocessors of the filtering process, a transient improvement (TI) technique is applied to enhance the slow transitions of the red and blue channels that are blurred by the CA. The TI process improves the filtering performance by narrowing the false color regions before the filtering process when severe color fringes (typically purple fringes) occur widely. Last, the CA-corrected chroma-signal is combined with the TI chroma-signal to avoid incorrect color adjustment. The experimental results show that the proposed method substantially reduces the CA artifacts and provides natural-looking replacement colors, while it avoids incorrect color adjustment ETPL New Class Tiling Design for Dot-Diffused Halftoning DIP-109 Abstract: In this paper, a new class tiling designed dot diffusion along with the optimized class matrix and diffused matrix are proposed. The result of this method presents a nearly periodic-free halftone when compared to the former schemes. Formerly, the class matrix of the dot diffusion is duplicated and orthogonally tiled to fulfill the entire image for further thresholding and quantized-error diffusion, which accompanies subsequent periodic artifacts. In our observation, this artifact can be solved by manipulating the class tiling with comprising rotation, transpose, and alternatively shifting of the class matrices. As documented in the experimental results, the proposed dot diffusion has been compared with the former halftoning methods with parallelism in terms of image quality, processing efficiency, periodicity, and memory consumption; the proposed dot diffusion exhibits as a very competitive candidate in the printing/display market ETPL W-Tree Indexing for Fast Visual Word Generation

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
DIP-110 Abstract: The bag-of-visual-words representation has been widely used in image retrieval and visual recognition. The most time-consuming step in obtaining this representation is the visual word generation, i.e., assigning visual words to the corresponding local features in a high-dimensional space. Recently, structures based on multibranch trees and forests have been adopted to reduce the time cost. However, these approaches cannot perform well without a large number of backtrackings. In this paper, by considering the spatial correlation of local features, we can significantly speed up the time consuming visual word generation process while maintaining accuracy. In particular, visual words associated with certain structures frequently co-occur; hence, we can build a co-occurrence table for each visual word for a large-scale data set. By associating each visual word with a probability according to the corresponding co-occurrence table, we can assign a probabilistic weight to each node of a certain index structure (e.g., a KD-tree and a K-means tree), in order to re-direct the searching path to be close to its global optimum within a small number of backtrackings. We carefully study the proposed scheme by comparing it with the fast library for approximate nearest neighbors and the random KD-trees on the Oxford data set. Thorough experimental results suggest the efficiency and effectiveness of the new scheme. ETPL Efficient Improvements on the BDND Filtering Algorithm for the Removal of HighDIP-111 Density Impulse Noise Abstract: Switching median filters are known to outperform standard median filters in the removal of impulse noise due to their capability of filtering candidate noisy pixels and leaving other pixels intact. The boundary discriminative noise detection (BDND) is one powerful example in this class of filters. However, there are some issues related to the filtering step in the BDND algorithm that may degrade its performance. In this paper, we propose two modifications to the filtering step of the BDND algorithm to address these issues. Experimental evaluation shows the effectiveness of the proposed modifications in producing sharper images than the BDND algorithm. ETPL Novel Approaches to the Parametric Cubic-Spline Interpolation DIP-112 Abstract: The cubic-spline interpolation (CSI) scheme can be utilized to obtain a better quality reconstructed image. It is based on the least-squares method with cubic convolution interpolation (CCI) function. Within the parametric CSI scheme, it is difficult to determine the optimal parameter for various target images. In this paper, a novel method involving the concept of opportunity costs is proposed to identify the most suitable parameter for the CCI function needed in the CSI scheme. It is shown that such an optimal four-point CCI function in conjunction with the least-squares method can achieve a better performance with the same arithmetic operations in comparison with the existing CSI algorithm. In addition, experimental results show that the optimal six-point CSI scheme together with cross-zonal filter is superior in performance to the optimal four-point CSI scheme without increasing the computational complexity. ETPL Generation of All-in-Focus Images by Noise-Robust Selective Fusion of Limited DepthDIP-113 of-Field Images Abstract: The limited depth-of-field of some cameras prevents them from capturing perfectly focused images when the imaged scene covers a large distance range. In order to compensate for this problem, image fusion has been exploited for combining images captured with different camera settings, thus yielding a higher quality all-in-focus image. Since most current approaches for image fusion rely on maximizing the spatial frequency of the composed image, the fusion process is sensitive to noise. In this paper, a new algorithm for computing the all-in-focus image from a sequence of images captured with a low depth-of-field camera is presented. The proposed approach adaptively fuses the different frames of the focus sequence in order to reduce noise while preserving image features. The algorithm consists of three stages: 1) focus measure; 2) selectivity measure; 3) and image fusion. An extensive set of

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
experimental tests has been carried out in order to compare the proposed algorithm with state-of-the-art all-in-focus methods using both synthetic and real sequences. The obtained results show the advantages of the proposed scheme even for high levels of noise. ETPL Missing Texture Reconstruction Method Based on Error Reduction Algorithm Using DIP-114 Fourier Transform Magnitude Estimation Scheme Abstract: A missing texture reconstruction method based on an error reduction (ER) algorithm, including a novel estimation scheme of Fourier transform magnitudes is presented in this brief. In our method, Fourier transform magnitude is estimated for a target patch including missing areas, and the missing intensities are estimated by retrieving its phase based on the ER algorithm. Specifically, by monitoring errors converged in the ER algorithm, known patches whose Fourier transform magnitudes are similar to that of the target patch are selected from the target image. In the second approach, the Fourier transform magnitude of the target patch is estimated from those of the selected known patches and their corresponding errors. Consequently, by using the ER algorithm, we can estimate both the Fourier transform magnitudes and phases to reconstruct the missing areas ETPL A Robust Fuzzy Local Information C-Means Clustering Algorithm DIP-115 Abstract: In a recent paper, Krinidis and Chatzis proposed a variation of fuzzy c-means algorithm for image clustering. The local spatial and gray-level information are incorporated in a fuzzy way through an energy function. The local minimizers of the designed energy function to obtain the fuzzy membership of each pixel and cluster centers are proposed. In this paper, it is shown that the local minimizers of Krinidis and Chatzis to obtain the fuzzy membership and the cluster centers in an iterative manner are not exclusively solutions for true local minimizers of their designed energy function. Thus, the local minimizers of Krinidis and Chatzis do not converge to the correct local minima of the designed energy function not because of tackling to the local minima, but because of the design of energy function. ETPL Nonlinearity Detection in Hyperspectral Images Using a Polynomial Post-Nonlinear DIP-116 Mixing Model Abstract: This paper studies a nonlinear mixing model for hyperspectral image unmixing and nonlinearity detection. The proposed model assumes that the pixel reflectances are nonlinear functions of pure spectral components contaminated by an additive white Gaussian noise. These nonlinear functions are approximated by polynomials leading to a polynomial post-nonlinear mixing model. We have shown in a previous paper that the parameters involved in the resulting model can be estimated using least squares methods. A generalized likelihood ratio test based on the estimator of the nonlinearity parameter is proposed to decide whether a pixel of the image results from the commonly used linear mixing model or from a more general nonlinear mixing model. To compute the test statistic associated with the nonlinearity detection , we propose to approximate the variance of the estimated nonlinearity parameter by its constrained Cramer -Rao bound. The performance of the detection strategy is evaluated via simulations conducted on synthetic and real data. More precisely, synthetic data have been generated according to the standard linear mixing model and three nonlinear models from the literature. The real data investigated in this study are extracted from the Cuprite image, which shows that some minerals seem to be nonlinearly mixed in this image. Finally, it is interesting to note that the estimated abundance maps obtained with the post-nonlinear mixing model are in good agreement with results obtained in previous studies. ETPL Wavelet Bayesian Network Image Denoising DIP-117 Abstract: From the perspective of the Bayesian approach, the denoising problem is essentially a prior probability modeling and estimation task. In this paper, we propose an approach that exploits a hidden Bayesian network, constructed from wavelet coefficients, to model the prior probability of the original image. Then, we use the belief propagation (BP) algorithm, which estimates a coefficient based on all the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
coefficients of an image, as the maximum-a-posterior (MAP) estimator to derive the denoised wavelet coefficients. We show that if the network is a spanning tree, the standard BP algorithm can perform MAP estimation efficiently. Our experiment results demonstrate that, in terms of the peak-signal-to-noise-ratio and perceptual quality, the proposed approach outperforms state-of-the-art algorithms on several images, particularly in the textured regions, with various amounts of white Gaussian noise. ETPL Acceleration of the Shiftable Algorithm for Bilateral Filtering and Nonlocal DIP-118 Means Abstract: A direct implementation of the bilateral filter requires O(s2) operations per pixel, where s is the (effective) width of the spatial kernel. A fast implementation of the bilateral filter that required O(1) operations per pixel with respect to s was recently proposed. This was done by using trigonometric functions for the range kernel of the bilateral filter, and by exploiting their so-called shiftability property. In particular, a fast implementation of the Gaussian bilateral filter was realized by approximating the Gaussian range kernel using raised cosines. Later, it was demonstrated that this idea could be extended to a larger class of filters, including the popular non-local means filter. As already observed, a flip side of this approach was that the run time depended on the width r of the range kernel. For an image with dynamic range [0,T], the run time scaled as O(T2/r2) with r. This made it difficult to implement narrow range kernels, particularly for images with large dynamic range. In this paper, we discuss this problem, and propose some simple steps to accelerate the implementation, in general, and for small r in particular. We provide some experimental results to demonstrate the acceleration that is achieved using these modifications.

ETPL Determining the Intrinsic Dimension of a Hyperspectral Image Using Random Matrix DIP-119 Theory Abstract: Determining the intrinsic dimension of a hyperspectral image is an important step in the spectral unmixing process and under- or overestimation of this number may lead to incorrect unmixing in unsupervised methods. In this paper, we discuss a new method for determining the intrinsic dimension using recent advances in random matrix theory. This method is entirely unsupervised, free from any userdetermined parameters and allows spectrally correlated noise in the data. Robustness tests are run on synthetic data, to determine how the results were affected by noise levels, noise variability, noise approximation, and spectral characteristics of the end-members. Success rates are determined for many different synthetic images, and the method is tested on two pairs of real images, namely a Cuprite scene taken from Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) and SpecTIR sensors, and a Lunar Lakes scene taken from AVIRIS and Hyperion, with good results. ETPL Learning Smooth Pattern Transformation Manifolds DIP-120 Abstract: Manifold models provide low-dimensional representations that are useful for processing and analyzing data in a transformation-invariant way. In this paper, we study the problem of learning smooth pattern transformation manifolds from image sets that represent observations of geometrically transformed signals. To construct a manifold, we build a representative pattern whose transformations accurately fit various input images. We examine two objectives of the manifold-building problem, namely, approximation and classification. For the approximation problem, we propose a greedy method that constructs a representative pattern by selecting analytic atoms from a continuous dictionary manifold. We present a dc optimization scheme that is applicable to a wide range of transformation and dictionary models, and demonstrate its application to the transformation manifolds generated by the rotation, translation, and anisotropic scaling of a reference pattern. Then, we generalize this approach to a setting with multiple transformation manifolds, where each manifold represents a different class of signals. We

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
present an iterative multiple-manifold-building algorithm such that the classification accuracy is promoted in the learning of the representative patterns. The experimental results suggest that the proposed methods yield high accuracy in the approximation and classification of data compared with some reference methods, while the invariance to geometric transformations is achieved because of the transformation manifold model. ETPL Modifying JPEG Binary Arithmetic Codec for Exploiting Inter/Intra-Block and DCT DIP-121 Coefficient Sign Redundancies Abstract: This article presents four modifications to the JPEG arithmetic coding (JAC) algorithm, a topic not studied well before. It then compares the compression performance of the modified JPEG with JPEG XR, the latest block-based image coding standard. We first show that the bulk of inter/intra-block redundancy, caused due to the use of the block-based approach by JPEG, can be captured by applying efficient prediction coding. We propose the following modifications to JAC to take advantages of our prediction approach. 1) We code a totally different DC difference. 2) JAC tests a DCT coefficient by considering its bits in the increasing order of significance for coding the most significant bit position. It causes plenty of redundancy because JAC always begins with the zeroth bit. We modify this coding order and propose alternations to the JPEG coding procedures. 3) We predict the sign of significant DCT coefficients, a problem is not addressed from the perspective of the JPEG decoder before. 4) We reduce the number of binary tests that JAC codes to mark end-of-block. We provide experimental results for two sets of eight-bit gray images. The first set consists of nine classical test images mostly of size 512  × 512 pixels. The second set consists of 13 images of size 2000 × 3000 pixels or more. Our modifications to JAC obtain extra-ordinary amount of code reduction without adding any kind of losses. More specifically, when we quantize the images using the default quantizers, our modifications reduce the total JAC code size of the images of these two sets by about 8.9 and 10.6%, and the JPEG Huffman code size by about 16.3 and 23.4%, respectively, on the average. Gains are even higher for coarsely quantized images. Finally, we compare the modified JAC with two settings of JPEG XR, one with no block overlapping and the other with the default transform (we denote them by JXR0 and JXR1, respectively). Our results show- that for the finest quality rate image coding, the modified JAC compresses the large set images by about 5.8% more than JXR1 and by 6.7% more than JXR0, on the average. We provide some rate-distortion plots on lossy coding, which show that the modified JAC distinctly outperforms JXR0, but JXR1 beats us by about a similar margin. ETPL Motion Estimation Without Integer-Pel Search DIP-122 Abstract: The typical motion estimation (ME) consists of three main steps, including spatial-temporal prediction, integer-pel search, and fractional-pel search. The integer-pel search, which seeks the best matched integer-pel position within a search window, is considered to be crucial for video encoding. It occupies over 50% of the overall encoding time (when adopting the full search scheme) for software encoders, and introduces remarkable area cost, memory traffic, and power consumption to hardware encoders. In this paper, we find that video sequences (especially high-resolution videos) can often be encoded effectively and efficiently even without integer-pel search. Such counter-intuitive phenomenon is not only because that spatial-temporal prediction and fractional-pel search are accurate enough for the ME of many blocks. In fact, we observe that when the predicted motion vector is biased from the optimal motion vector (mainly for boundary blocks of irregularly moving objects), it is also hard for integer-pel search to reduce the final rate-distortion cost: the deviation of reference position could be alleviated with the fractional-pel interpolation and rate-distortion optimization techniques (e.g., adaptive macroblock mode). Considering the decreasing proportion of boundary blocks caused by the increasing resolution of videos, integer-pel search may be rather cost-ineffective in the era of high-resolution. Experimental results on 36 typical sequences of different resolutions encoded with x264, which is a widely-used video

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
encoder, comply with our analysis well. For 1080p sequences, removing the integer-pel search saves 57.9% of the overall H.264 encoding time on average (compared to the original x264 with full integer-pel search using default parameters), while the resultant performance loss is negligible: the bit-rate is increased by only 0.18%, while the peak signal-to-noise ratio is decreased by only 0.01 dB per frame averagely. ETPL A Protocol for Evaluating Video Trackers Under Real-World Conditions DIP-123 Abstract: The absence of a commonly adopted performance evaluation framework is hampering advances in the design of effective video trackers. In this paper, we present a single-score evaluation measure and a protocol to objectively compare trackers. The proposed measure evaluates tracking accuracy and failure, and combines them for both summative and formative performance assessment. The proposed protocol is composed of a set of trials that evaluate the robustness of trackers on a range of test scenarios representing several real-world conditions. The protocol is validated on a set of sequences with a diversity of targets (head, vehicle and person) and challenges (occlusions, background clutter, pose changes and scale changes) using six state-of-the-art trackers, highlighting their strengths and weaknesses on more than 187000 frames. The software implementing the protocol and the evaluation results are made available online and new results can be included, thus facilitating the comparison of trackers. ETPL Blur and Illumination Robust Face Recognition via Set-Theoretic Characterization DIP-124 Abstract: We address the problem of unconstrained face recognition from remotely acquired images. The main factors that make this problem challenging are image degradation due to blur, and appearance variations due to illumination and pose. In this paper, we address the problems of blur and illumination. We show that the set of all images obtained by blurring a given image forms a convex set. Based on this set-theoretic characterization, we propose a blur-robust algorithm whose main step involves solving simple convex optimization problems. We do not assume any parametric form for the blur kernels, however, if this information is available it can be easily incorporated into our algorithm. Furthermore, using the low-dimensional model for illumination variations, we show that the set of all images obtained from a face image by blurring it and by changing the illumination conditions forms a bi-convex set. Based on this characterization, we propose a blur and illumination-robust algorithm. Our experiments on a challenging real dataset obtained in uncontrolled settings illustrate the importance of jointly modeling blur and illumination. ETPL Improved Bounds for Subband-Adaptive Iterative Shrinkage/Thresholding Algorithms DIP-125 Abstract: This paper presents new methods for computing the step sizes of the subband-adaptive iterative shrinkage-thresholding algorithms proposed by Bayram & Selesnick and Vonesch & Unser. The method yields tighter wavelet-domain bounds of the system matrix, thus leading to improved convergence speeds. It is directly applicable to non-redundant wavelet bases, and we also adapt it for cases of redundant frames. It turns out that the simplest and most intuitive setting for the step sizes that ignores subband aliasing is often satisfactory in practice. We show that our methods can be used to advantage with reweighted least squares penalty functions as well as L1 penalties. We emphasize that the algorithms presented here are suitable for performing inverse filtering on very large datasets, including 3D data, since inversions are applied only to diagonal matrices and fast transforms are used to achieve all matrixvector products. ETPL Sparse Representation Based Image Interpolation With Nonlocal Autoregressive DIP-126 Modeling Abstract: Sparse representation is proven to be a promising approach to image super-resolution, where the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
low-resolution (LR) image is usually modeled as the down-sampled version of its high-resolution (HR) counterpart after blurring. When the blurring kernel is the Dirac delta function, i.e., the LR image is directly down-sampled from its HR counterpart without blurring, the super-resolution problem becomes an image interpolation problem. In such cases, however, the conventional sparse representation models (SRM) become less effective, because the data fidelity term fails to constrain the image local structures. In natural images, fortunately, many nonlocal similar patches to a given patch could provide nonlocal constraint to the local structure. In this paper, we incorporate the image nonlocal self-similarity into SRM for image interpolation. More specifically, a nonlocal autoregressive model (NARM) is proposed and taken as the data fidelity term in SRM. We show that the NARM-induced sampling matrix is less coherent with the representation dictionary, and consequently makes SRM more effective for image interpolation. Our extensive experimental results demonstrate that the proposed NARM-based image interpolation method can effectively reconstruct the edge structures and suppress the jaggy/ringing artifacts, achieving the best image interpolation results so far in terms of PSNR as well as perceptual quality metrics such as SSIM and FSIM. ETPL View-Based Discriminative Probabilistic Modeling for 3D Object Retrieval and DIP-127 Recognition Abstract: In view-based 3D object retrieval and recognition, each object is described by multiple views. A central problem is how to estimate the distance between two objects. Most conventional methods integrate the distances of view pairs across two objects as an estimation of their distance. In this paper, we propose a discriminative probabilistic object modeling approach. It builds probabilistic models for each object based on the distribution of its views, and the distance between two objects is defined as the upper bound of the Kullback-Leibler divergence of the corresponding probabilistic models. 3D object retrieval and recognition is accomplished based on the distance measures. We first learn models for each object by the adaptation from a set of global models with a maximum likelihood principle. A further adaption step is then performed to enhance the discriminative ability of the models. We conduct experiments on the ETH 3D object dataset, the National Taiwan University 3D model dataset, and the Princeton Shape Benchmark. We compare our approach with different methods, and experimental results demonstrate the superiority of our approach. ETPL Robust Document Image Binarization Technique for Degraded Document Images DIP-128 Abstract: Segmentation of text from badly degraded document images is a very challenging task due to the high inter/intra-variation between the document background and the foreground text of different document images. In this paper, we propose a novel document image binarization technique that addresses these issues by using adaptive image contrast. The adaptive image contrast is a combination of the local image contrast and the local image gradient that is tolerant to text and background variation caused by different types of document degradations. In the proposed technique, an adaptive contrast map is first constructed for an input degraded document image. The contrast map is then binarized and combined with Canny's edge map to identify the text stroke edge pixels. The document text is further segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. The proposed method is simple, robust, and involves minimum parameter tuning. It has been tested on three public datasets that are used in the recent document image binarization contest (DIBCO) 2009 & 2011 and handwritten-DIBCO 2010 and achieves accuracies of 93.5%, 87.8%, and 92.03%, respectively, that are significantly higher than or close to that of the best-performing methods reported in the three contests. Experiments on the Bickley diary dataset that consists of several challenging bad quality document images also show the superior performance of our proposed method, compared with other techniques. ETPL Perceptual Video Coding Based on SSIM-Inspired Divisive Normalization

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
DIP-129 Abstract: We propose a perceptual video coding framework based on the divisive normalization scheme, which is found to be an effective approach to model the perceptual sensitivity of biological vision, but has not been fully exploited in the context of video coding. At the macroblock (MB) level, we derive the normalization factors based on the structural similarity (SSIM) index as an attempt to transform the discrete cosine transform domain frame residuals to a perceptually uniform space. We further develop an MB level perceptual mode selection scheme and a frame level global quantization matrix optimization method. Extensive simulations and subjective tests verify that, compared with the H.264/AVC video coding standard, the proposed method can achieve significant gain in terms of rate-SSIM performance and provide better visual quality. ETPL Hyperspectral Image Representation and Processing With Binary Partition Trees DIP-130 Abstract: The optimal exploitation of the information provided by hyperspectral images requires the development of advanced image-processing tools. This paper proposes the construction and the processing of a new region-based hierarchical hyperspectral image representation relying on the binary partition tree (BPT). This hierarchical region-based representation can be interpreted as a set of hierarchical regions stored in a tree structure. Hence, the BPT succeeds in presenting: 1) the decomposition of the image in terms of coherent regions, and 2) the inclusion relations of the regions in the scene. Based on region-merging techniques, the BPT construction is investigated by studying the hyperspectral region models and the associated similarity metrics. Once the BPT is constructed, the fixed tree structure allows implementing efficient and advanced application-dependent techniques on it. The application-dependent processing of BPT is generally implemented through a specific pruning of the tree. In this paper, a pruning strategy is proposed and discussed in a classification context. Experimental results on various hyperspectral data sets demonstrate the interest and the good performances of the BPT representation. ETPL Visually Weighted Compressive Sensing: Measurement and Reconstruction DIP-131 Abstract: Compressive sensing (CS) makes it possible to more naturally create compact representations of data with respect to a desired data rate. Through wavelet decomposition, smooth and piecewise smooth signals can be represented as sparse and compressible coefficients. These coefficients can then be effectively compressed via the CS. Since a wavelet transform divides image information into layered blockwise wavelet coefficients over spatial and frequency domains, visual improvement can be attained by an appropriate perceptually weighted CS scheme. We introduce such a method in this paper and compare it with the conventional CS. The resulting visual CS model is shown to deliver improved visual reconstructions. ETPL Context-Aware Sparse Decomposition for Image Denoising and Super-Resolution DIP-132 Abstract: Image prior models based on sparse and redundant representations are attracting more and more attention in the field of image restoration. The conventional sparsity-based methods enforce sparsity prior on small image patches independently. Unfortunately, these works neglected the contextual information between sparse representations of neighboring image patches. It limits the modeling capability of sparsity-based image prior, especially when the major structural information of the source image is lost in the following serious degradation process. In this paper, we utilize the contextual information of local patches (denoted as context-aware sparsity prior) to enhance the performance of sparsity-based restoration method. In addition, a unified framework based on the Markov random fields model is proposed to tune the local prior into a global one to deal with arbitrary size images. An iterative numerical solution is presented to solve the joint problem of model parameters estimation and sparse

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
recovery. Finally, the experimental results on image denoising and super-resolution demonstrate the effectiveness and robustness of the proposed context-aware method. ETPL How to SAIF-ly Boost Denoising Performance DIP-133 Abstract: Spatial domain image filters (e.g., bilateral filter, non-local means, locally adaptive regression kernel) have achieved great success in denoising. Their overall performance, however, has not generally surpassed the leading transform domain-based filters (such as BM3-D). One important reason is that spatial domain filters lack efficiency to adaptively fine tune their denoising strength; something that is relatively easy to do in transform domain method with shrinkage operators. In the pixel domain, the smoothing strength is usually controlled globally by, for example, tuning a regularization parameter. In this paper, we propose spatially adaptive iterative filtering (SAIF) a new strategy to control the denoising strength locally for any spatial domain method. This approach is capable of filtering local image content iteratively using the given base filter, and the type of iteration and the iteration number are automatically optimized with respect to estimated risk (i.e., mean-squared error). In exploiting the estimated local signal-to-noise-ratio, we also present a new risk estimator that is different from the often-employed SURE method, and exceeds its performance in many cases. Experiments illustrate that our strategy can significantly relax the base algorithm's sensitivity to its tuning (smoothing) parameters, and effectively boost the performance of several existing denoising filters to generate state-of-the-art results under both simulated and practical conditions. ETPL Frozen-State Hierarchical Annealing DIP-134 Abstract: There is significant interest in the synthesis of discrete-state random fields, particularly those possessing structure over a wide range of scales. However, given a model on some finest, pixellated scale, it is computationally very difficult to synthesize both large- and small-scale structures, motivating research into hierarchical methods. In this paper, we propose a frozen-state approach to hierarchical modeling, in which simulated annealing is performed on each scale, constrained by the state estimates at the parent scale. This approach leads to significant advantages in both modeling flexibility and computational complexity. In particular, a complex structure can be realized with very simple, local, scale-dependent models, and by constraining the domain to be annealed at finer scales to only the uncertain portions of coarser scales; the approach leads to huge improvements in computational complexity. Results are shown for a synthesis problem in porous media. ETPL Per-Colorant-Channel Color Barcodes for Mobile Applications: An Interference DIP-135 Cancellation Framework Abstract: We propose a color barcode framework for mobile phone applications by exploiting the spectral diversity afforded by the cyan (C), magenta (M), and yellow (Y) print colorant channels commonly used for color printing and the complementary red (R), green (G), and blue (B) channels, respectively, used for capturing color images. Specifically, we exploit this spectral diversity to realize a three-fold increase in the data rate by encoding independent data in the C, M, and Y print colorant channels and decoding the data from the complementary R, G, and B channels captured via a mobile phone camera. To mitigate the effect of cross-channel interference among the print-colorant and capture color channels, we develop an algorithm for interference cancellation based on a physically-motivated mathematical model for the print and capture processes. To estimate the model parameters required for cross-channel interference cancellation, we propose two alternative methodologies: a pilot block approach that uses suitable selections of colors for the synchronization blocks and an expectation maximization approach that estimates the parameters from regions encoding the data itself. We evaluate the performance of the proposed framework using specific implementations of the framework for two of the most commonly used barcodes in mobile applications, QR and Aztec codes. Experimental results show that the proposed

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
framework successfully overcomes the impact of the color interference, providing a low bit error rate and a high decoding rate for each of the colorant channels when used with a corresponding error correction scheme. ETPL Segmented Gray-Code Kernels for Fast Pattern Matching DIP-136 Abstract: The gray-code kernels (GCK) family, which has Walsh Hadamard transform on sliding windows as a member, is a family of kernels that can perform image analysis efficiently using a fast algorithm, such as the GCK algorithm. The GCK has been successfully used for pattern matching. In this paper, we propose that the G4-GCK algorithm is more efficient than the previous algorithm in computing GCK. The G4-GCK algorithm requires four additions per pixel for three basis vectors independent of transform size and dimension. Based on the G4-GCK algorithm, we then propose the segmented GCK. By segmenting input data into Ls parts, the SegGCK requires only four additions per pixel for 3 Ls basis vectors. Experimental results show that the proposed algorithm can significantly accelerate the full-search equivalent pattern matching process and outperforms state-of-the-art methods. ETPL Video Processing for Human Perceptual Visual Quality-Oriented Video Coding DIP-137 Abstract: We have developed a video processing method that achieves human perceptual visual qualityoriented video coding. The patterns of moving objects are modeled by considering the limited human capacity for spatial-temporal resolution and the visual sensory memory together, and an online moving pattern classifier is devised by using the Hedge algorithm. The moving pattern classifier is embedded in the existing visual saliency with the purpose of providing a human perceptual video quality saliency model. In order to apply the developed saliency model to video coding, the conventional foveation filtering method is extended. The proposed foveation filter can smooth and enhance the video signals locally, in conformance with the developed saliency model, without causing any artifacts. The performance evaluation results confirm that the proposed video processing method shows reliable improvements in the perceptual quality for various sequences and at various bandwidths, compared to existing saliency-based video coding methods. ETPL Additive Log-Logistic Model for Networked Video Quality Assessment DIP-138 Abstract: Modeling subjective opinions on visual quality is a challenging problem, which closely relates to many factors of the human perception. In this paper, the additive log-logistic model (ALM) is proposed to formulate such a multidimensional nonlinear problem. The log-logistic model has flexible monotonic or nonmonotonic partial derivatives and thus is suitable to model various uni-type impairments. The proposed ALM metric adds the distortions due to each type of impairment in a log-logistic transformed space of subjective opinions. The features can be evaluated and selected by classic statistical inference, and the model parameters can be easily estimated. Cross validations on five Telecommunication Standardization Sector of International Telecommunication Union (ITU-T) subjectively-rated databases confirm that: 1) based on the same features, the ALM outper-forms the support vector regression and the logistic model in quality prediction and, 2) the resultant no-reference quality met-ric based on impairment-relevant video parameters achieves high correlation with a total of 27 216 subjective opinions on 1134 video clips, even compared with existing full-reference quality metrics based on pixel differences. The ALM metric wins the model competition of the ITU-T Study Group 12 (where the validation databases are independent with the training databases) and thus is being put forth into ITU-T Recommendation P.1202.2 for the consent of ITU-T. ETPL Linear Feature Separation From Topographic Maps Using Energy Density and the DIP-139 Shear Transform Abstract: Linear features are difficult to be separated from complicated background in color scanned

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
topographic maps, especially when the color of linear features approximate to that of background in some particular images. This paper presents a method, which is based on energy density and the shear transform, for the separation of lines from background. First, the shear transform, which could add the directional characteristics of the lines, is introduced to overcome the disadvantage that linear information loss would happen if the separation method is used in an image, which is in only one direction. Then templates in the horizontal and vertical directions are built to separate lines from background on account of the fact that the energy concentration of the lines usually reaches a higher level than that of the background in the negtive image. Furthermore, the remaining grid background can be wiped off by grid templates matching. The isolated patches, which include only one pixel or less than ten pixels, are removed according to the connected region area measurement. Finally, using the union operation, the linear features obtained in different sheared images could supplement each other, thus the lines of the final result are more complete. The basic property of this method is introducing the energy density instead of color information commonly used in traditional methods. The experiment results indicate that the proposed method could distinguish the linear features from the background more effectively, and obtain good results for its ability in changing the directions of the lines with the shear transform. ETPL De-Interlacing Using Nonlocal Costs and Markov-Chain-Based Estimation of DIP-140 Interpolation Methods Abstract: A new method of de-interlacing is proposed. De-interlacing is revisited as the problem of assigning a sequence of interpolation methods (interpolators) to a sequence of missing pixels of an interlaced frame (field). With this assumption, our de-interlacing algorithm (de-interlacer), undergoes transitions from one interpolation method to another, as it moves from one missing pixel position to the horizontally adjacent missing pixel position in a missing row of a field. We assume a discrete countablestate Markov-chain model on the sequence of interpolators (Markov-chain states) which are selected from a user-defined set of candidate interpolators. An estimation of the optimum sequence of interpolators with the aforementioned Markov-chain model requires the definition of an efficient cost function as well as a global optimization technique. Our algorithm introduces for the first time using a nonlocal cost (NLC) scheme. The proposed algorithm uses the NLC to not only measure the fitness of an interpolator at a missing pixel position, but also to derive an approximation for transition matrix (TM) of the Markovchain of interpolators. The TM in our algorithm is a frame-variate matrix, i.e., the algorithm updates the TM for each frame automatically. The algorithm finally uses a Viterbi algorithm to find the global optimum sequence of interpolators given the cost function defined and neighboring original pixels in hand. Next, we introduce a new MAP-based formulation for the estimation of the sequence of interpolators this time not by estimating the best sequence of interpolators but by successive estimations of the best interpolator at each missing pixel using Forward-Backward algorithm. Simulation results prove that, while competitive with each other on different test sequences, the proposed methods (one using Viterbi and the other Forward-Backward algorithm) are superior to state-of-the-art de-interlacing algorithms proposed recently. Finally, we propose motion compensa- ed versions of our algorithm based on optical flow computation methods and discuss how it can improve the proposed algorithm. ETPL Pose-Invariant Face Recognition Using Markov Random Fields DIP-141 Abstract: One of the key challenges for current face recognition techniques is how to handle pose variations between the probe and gallery face images. In this paper, we present a method for reconstructing the virtual frontal view from a given nonfrontal face image using Markov random fields (MRFs) and an efficient variant of the belief propagation algorithm. In the proposed approach, the input face image is divided into a grid of overlapping patches, and a globally optimal set of local warps is estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
efficiently in the Fourier domain using an extension of the Lucas-Kanade algorithm that can handle illumination variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial landmarks or head pose estimation. In order to improve the performance of our pose normalization method in face recognition, we also present an algorithm for classifying whether a given face image is at a frontal or nonfrontal pose. Experimental results on different datasets are presented to demonstrate the effectiveness of the proposed approach. ETPL Objective-Guided Image Annotation DIP-142 Abstract: Automatic image annotation, which is usually formulated as a multi-label classification problem, is one of the major tools used to enhance the semantic understanding of web images. Many multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from being practical. On the other hand, specific measures are usually designed to evaluate how well one annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal performance of these objective-specific measures. To address this issue, we first summarize a variety of objective-guided performance measures under a unified representation. Our analysis reveals that macroaveraging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to the analysis of various measures and the high time complexity of optimizing micro-averaging measures, in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely used multi-label datasets, and demonstrate the superior performance of our proposed method over stateof-the-art baseline methods in terms of example-based measures on four - mage annotation datasets. ETPL Multiview Coding Mode Decision With Hybrid Optimal Stopping Model DIP-143 Abstract: In a generic decision process, optimal stopping theory aims to achieve a good tradeoff between decision performance and time consumed, with the advantages of theoretical decision-making and predictable decision performance. In this paper, optimal stopping theory is employed to develop an effective hybrid model for the mode decision problem, which aims to theoretically achieve a good tradeoff between the two interrelated measurements in mode decision, as computational complexity reduction and rate-distortion degradation. The proposed hybrid model is implemented and examined with a multiview encoder. To support the model and further promote coding performance, the multiview coding mode characteristics, including predicted mode probability and estimated coding time, are jointly investigated with inter-view correlations. Exhaustive experimental results with a wide range of video resolutions reveal the efficiency and robustness of our method, with high decision accuracy, negligible computational overhead, and almost intact rate-distortion performance compared to the original encoder. ETPL Joint Framework for Motion Validity and Estimation Using Block Overlap DIP-144 Abstract: This paper presents a block-overlap-based validity metric for use as a measure of motion vector

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
(MV) validity and to improve the quality of the motion field. In contrast to other validity metrics in the literature, the proposed metric is not sensitive to image features and does not require the use of neighboring MVs or manual thresholds. Using a hybrid de-interlacer, it is shown that the proposed metric outperforms other block-based validity metrics in the literature. To help regularize the ill-posed nature of motion estimation, the proposed validity metric is also used as a regularizer in an energy minimization framework to determine the optimal MV. Experimental results show that the proposed energy minimization framework outperforms several existing motion estimation methods in the literature in terms of MV and interpolation quality. For interpolation quality, our algorithm outperforms all other block-based methods as well as several complex optical flow methods. In addition, it is one of the fastest implementations at the time of this writing. ETPL Nonlocally Centralized Sparse Representation for Image Restoration DIP-145 Abstract: Sparse representation models code an image patch as a linear combination of a few atoms chosen out from an over-complete dictionary, and they have shown promising results in various image restoration applications. However, due to the degradation of the observed image (e.g., noisy, blurred, and/or down-sampled), the sparse representations by conventional models may not be accurate enough for a faithful reconstruction of the original image. To improve the performance of sparse representationbased image restoration, in this paper the concept of sparse coding noise is introduced, and the goal of image restoration turns to how to suppress the sparse coding noise. To this end, we exploit the image nonlocal self-similarity to obtain good estimates of the sparse coding coefficients of the original image, and then centralize the sparse coding coefficients of the observed image to those estimates. The so-called nonlocally centralized sparse representation (NCSR) model is as simple as the standard sparse representation model, while our extensive experiments on various types of image restoration problems, including denoising, deblurring and super-resolution, validate the generality and state-of-the-art performance of the proposed NCSR algorithm. ETPL Image Segmentation Using a Sparse Coding Model of Cortical Area V1 DIP-146 Abstract: Algorithms that encode images using a sparse set of basis functions have previously been shown to explain aspects of the physiology of a primary visual cortex (V1), and have been used for applications, such as image compression, restoration, and classification. Here, a sparse coding algorithm, that has previously been used to account for the response properties of orientation tuned cells in primary visual cortex, is applied to the task of perceptually salient boundary detection. The proposed algorithm is currently limited to using only intensity information at a single scale. However, it is shown to outperform the current state-of-the-art image segmentation method (Pb) when this method is also restricted to using the same information. ETPL Circular Reranking for Visual Search DIP-147 Abstract: Search reranking is regarded as a common way to boost retrieval precision. The problem nevertheless is not trivial especially when there are multiple features or modalities to be considered for search, which often happens in image and video retrieval. This paper proposes a new reranking algorithm, named circular reranking, that reinforces the mutual exchange of information across multiple modalities for improving search performance, following the philosophy that strong performing modality could learn from weaker ones, while weak modality does benefit from interacting with stronger ones. Technically, circular reranking conducts multiple runs of random walks through exchanging the ranking scores among different features in a cyclic manner. Unlike the existing techniques, the reranking procedure encourages interaction among modalities to seek a consensus that are useful for reranking. In this paper, we study several properties of circular reranking, including how and which order of information propagation

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
should be configured to fully exploit the potential of modalities for reranking. Encouraging results are reported for both image and video retrieval on Microsoft Research Asia Multimedia image dataset and TREC Video Retrieval Evaluation 2007-2008 datasets, respectively. ETPL On Random Field Completely Automated Public Turing Test to Tell Computers and DIP-148 Humans Apart Generation Abstract: Herein, we propose generating CAPTCHAs through random field simulation and give a novel, effective and efficient algorithm to do so. Indeed, we demonstrate that sufficient information about word tests for easy human recognition is contained in the site marginal probabilities and the site-to-nearby-site covariances and that these quantities can be embedded directly into certain conditional probabilities, designed for effective simulation. The CAPTCHAs are then partial random realizations of the random CAPTCHA word. We start with an initial random field (e.g., randomly scattered letter pieces) and use Gibbs resampling to re-simulate portions of the field repeatedly using these conditional probabilities until the word becomes human-readable. The residual randomness from the initial random field together with the random implementation of the CAPTCHA word provide significant resistance to attack. This results in a CAPTCHA, which is unrecognizable to modern optical character recognition but is recognized about 95% of the time in a human readability study.

ETPL Active Contours Driven by the Salient Edge Energy Model DIP-149 Abstract: In this brief, we present a new indicator, i.e., salient edge energy, for guiding a given contour robustly and precisely toward the object boundary. Specifically, we define the salient edge energy by exploiting the higher order statistics on the diffusion space, and incorporate it into a variational level set formulation with the local region-based segmentation energy for solving the problem of curve evolution. In contrast to most previous methods, the proposed salient edge energy allows the curve to find only significant local minima relevant to the object boundary even in the noisy and cluttered background. Moreover, the segmentation performance derived from our new energy is less sensitive to the size of local windows compared with other recently developed methods, owing to the ability of our energy function to suppress diverse clutters. The proposed method has been tested on various images, and experimental results show that the salient edge energy effectively drives the active contour both qualitatively and quantitatively compared to various state-of-the-art methods. ETPL Bayesian Saliency via Low and Mid Level Cues DIP-150 Abstract: Visual saliency detection is a challenging problem in computer vision, but one of great importance and numerous applications. In this paper, we propose a novel model for bottom-up saliency within the Bayesian framework by exploiting low and mid level cues. In contrast to most existing methods that operate directly on low level cues, we propose an algorithm in which a coarse saliency region is first obtained via a convex hull of interest points. We also analyze the saliency information with mid level visual cues via superpixels. We present a Laplacian sparse subspace clustering method to group superpixels with local features, and analyze the results with respect to the coarse saliency region to compute the prior saliency map. We use the low level visual cues based on the convex hull to compute the observation likelihood, thereby facilitating inference of Bayesian saliency at each pixel. Extensive experiments on a large data set show that our Bayesian saliency model performs favorably against the state-of-the-art algorithms. ETPL Exemplar-Based Image Inpainting Using Multiscale Graph Cuts DIP-151

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
Abstract: We present a novel formulation of exemplar-based inpainting as a global energy optimization problem, written in terms of the offset map. The proposed energy function combines a data attachment term that ensures the continuity of reconstruction at the boundary of the inpainting domain with a smoothness term that ensures a visually coherent reconstruction inside the hole. This formulation is adapted to obtain a global minimum using the graph cuts algorithm. To reduce the computational complexity, we propose an efficient multiscale graph cuts algorithm. To compensate the loss of information at low resolution levels, we use a feature representation computed at the original image resolution. This permits alleviation of the ambiguity induced by comparing only color information when the image is represented at low resolution levels. Our experiments show how well the proposed algorithm performs compared with other recent algorithms. ETPL Activity Recognition Using a Mixture of Vector Fields DIP-152 Abstract: The analysis of moving objects in image sequences (video) has been one of the major themes in computer vision. In this paper, we focus on video-surveillance tasks; more specifically, we consider pedestrian trajectories and propose modeling them through a small set of motion/vector fields together with a space-varying switching mechanism. Despite the diversity of motion patterns that can occur in a given scene, we show that it is often possible to find a relatively small number of typical behaviors, and model each of these behaviors by a simple motion field. We increase the expressiveness of the formulation by allowing the trajectories to switch from one motion field to another, in a space-dependent manner. We present an expectation-maximization algorithm to learn all the parameters of the model, and apply it to trajectory classification tasks. Experiments with both synthetic and real data support the claims about the performance of the proposed approach. ETPL Low-Resolution Face Tracker Robust to Illumination Variations DIP-153 Abstract: In many practical video surveillance applications, the faces acquired by outdoor cameras are of low resolution and are affected by uncontrolled illumination. Although significant efforts have been made to facilitate face tracking or illumination normalization in unconstrained videos, the approaches developed may not be effective in video surveillance applications. This is because: 1) a low-resolution face contains limited information, and 2) major changes in illumination on a small region of the face make the tracking ineffective. To overcome this problem, this paper proposes to perform tracking in an illumination-insensitive feature space, called the gradient logarithm field (GLF) feature space. The GLF feature mainly depends on the intrinsic characteristics of a face and is only marginally affected by the lighting source. In addition, the GLF feature is a global feature and does not depend on a specific face model, and thus is effective in tracking low-resolution faces. Experimental results show that the proposed GLF-based tracker works well under significant illumination changes and outperforms many state-of-theart tracking algorithms. ETPL Local Directional Number Pattern for Face Analysis: Face and Expression Recognition DIP-154 Abstract: This paper proposes a novel local feature descriptor, local directional number pattern (LDN), for face analysis, i.e., face and expression recognition. LDN encodes the directional information of the face's textures (i.e., the texture's structure) in a compact way, producing a more discriminative code than current methods. We compute the structure of each micro-pattern with the aid of a compass mask that extracts directional information, and we encode such information using the prominent direction indices (directional numbers) and sign-which allows us to distinguish among similar structural patterns that have different intensity transitions. We divide the face into several regions, and extract the distribution of the LDN features from them. Then, we concatenate these features into a feature vector, and we use it as a

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
face descriptor. We perform several experiments in which our descriptor performs consistently under illumination, noise, expression, and time lapse variations. Moreover, we test our descriptor with different masks to analyze its performance in different face analysis tasks. ETPL Regularized Robust Coding for Face Recognition DIP-155 Abstract: Recently the sparse representation based classification (SRC) has been proposed for robust face recognition (FR). In SRC, the testing image is coded as a sparse linear combination of the training samples, and the representation fidelity is measured by the l2-norm or l1 -norm of the coding residual. Such a sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution, which may not be effective enough to describe the coding residual in practical FR systems. Meanwhile, the sparsity constraint on the coding coefficients makes the computational cost of SRC very high. In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients. By assuming that the coding residual and the coding coefficient are respectively independent and identically distributed, the RRC seeks for a maximum a posterior solution of the coding problem. An iteratively reweighted regularized robust coding (IR3C) algorithm is proposed to solve the RRC model efficiently. Extensive experiments on representative face databases demonstrate that the RRC is much more effective and efficient than stateof-the-art sparse representation based methods in dealing with face occlusion, corruption, lighting, and expression changes, etc. ETPL Exploration of Optimal Many-Core Models for Efficient Image Segmentation DIP-156 Abstract: Image segmentation plays a crucial role in numerous biomedical imaging applications, assisting clinicians or health care professionals with diagnosis of various diseases using scientific data. However, its high computational complexities require substantial amount of time and have limited their applicability. Research has thus focused on parallel processing models that support biomedical image segmentation. In this paper, we present analytical results of the design space exploration of many-core processors for efficient fuzzy c-means (FCM) clustering, which is widely used in many medical image segmentations. We quantitatively evaluate the impact of varying a number of processing elements (PEs) and an amount of local memory for a fixed image size on system performance and efficiency using architectural and workload simulations. Experimental results indicate that PEs=4,096 provides the most efficient operation for the FCM algorithm with four clusters, while PEs=1,024 and PEs=4,096 yield the highest area efficiency and energy efficiency, respectively, for three clusters. ETPL Active Contour-Based Visual Tracking by Integrating Colors, Shapes, and Motions DIP-157 Abstract: In this paper, we present a framework for active contour-based visual tracking using level sets. The main components of our framework include contour-based tracking initialization, color-based contour evolution, adaptive shape-based contour evolution for non-periodic motions, dynamic shapebased contour evolution for periodic motions, and the handling of abrupt motions. For the initialization of contour-based tracking, we develop an optical flow-based algorithm for automatically initializing contours at the first frame. For the color-based contour evolution, Markov random field theory is used to measure correlations between values of neighboring pixels for posterior probability estimation. For adaptive shape-based contour evolution, the global shape information and the local color information are combined to hierarchically evolve the contour, and a flexible shape updating model is constructed. For the dynamic shape-based contour evolution, a shape mode transition matrix is learnt to characterize the temporal correlations of object shapes. For the handling of abrupt motions, particle swarm optimization is adopted to capture the global motion which is applied to the contour in the current frame to produce an initial contour in the next frame.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Image Quality Assessment Using Multi-Method Fusion DIP-158 Abstract: A new methodology for objective image quality assessment (IQA) with multi-method fusion (MMF) is presented in this paper. The research is motivated by the observation that there is no single method that can give the best performance in all situations. To achieve MMF, we adopt a regression approach. The new MMF score is set to be the nonlinear combination of scores from multiple methods with suitable weights obtained by a training process. In order to improve the regression results further, we divide distorted images into three to five groups based on the distortion types and perform regression within each group, which is called context-dependent MMF (CD-MMF). One task in CD-MMF is to determine the context automatically, which is achieved by a machine learning approach. To further reduce the complexity of MMF, we perform algorithms to select a small subset from the candidate method set. The result is very good even if only three quality assessment methods are included in the fusion process. The proposed MMF method using support vector regression is shown to outperform a large number of existing IQA methods by a significant margin when being tested in six representative databases. ETPL Robust Radial Face Detection for Omnidirectional Vision DIP-159 Abstract: Bio-inspired and non-conventional vision systems are highly researched topics. Among them, omnidirectional vision systems have demonstrated their ability to significantly improve the geometrical interpretation of scenes. However, few researchers have investigated how to perform object detection with such systems. The existing approaches require a geometrical transformation prior to the interpretation of the picture. In this paper, we investigate what must be taken into account and how to process omnidirectional images provided by the sensor. We focus our research on face detection and highlight the fact that particular attention should be paid to the descriptors in order to successfully perform face detection on omnidirectional images. We demonstrate that this choice is critical to obtaining high detection rates. Our results imply that the adaptation of existing object-detection frameworks, designed for perspective images, should be focused on the choice of appropriate image descriptors in the design of the object-detection pipeline. ETPL Optimized 3D Watermarking for Minimal Surface Distortion DIP-160 Abstract: This paper proposes a new approach to 3D watermarking by ensuring the optimal preservation of mesh surfaces. A new 3D surface preservation function metric is defined consisting of the distance of a vertex displaced by watermarking to the original surface, to the watermarked object surface as well as the actual vertex displacement. The proposed method is statistical, blind, and robust. Minimal surface distortion according to the proposed function metric is enforced during the statistical watermark embedding stage using Levenberg-Marquardt optimization method. A study of the watermark code crypto-security is provided for the proposed methodology. According to the experimental results, the proposed methodology has high robustness against the common mesh attacks while preserving the original object surface during watermarking ETPL Approximate Least Trimmed Sum of Squares Fitting and Applications in Image DIP-161 Analysis Abstract: The least trimmed sum of squares (LTS) regression estimation criterion is a robust statistical method for model fitting in the presence of outliers. Compared with the classical least squares estimator, which uses the entire data set for regression and is consequently sensitive to outliers, LTS identifies the outliers and fits to the remaining data points for improved accuracy. Exactly solving an LTS problem is NP-hard, but as we show here, LTS can be formulated as a concave minimization problem. Since it is usually tractable to globally solve a convex minimization or concave maximization problem in

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
polynomial time, inspired by , we instead solve LTS' approximate complementary problem, which is convex minimization. We show that this complementary problem can be efficiently solved as a second order cone program. We thus propose an iterative procedure to approximately solve the original LTS problem. Our extensive experiments demonstrate that the proposed method is robust, efficient and scalable in dealing with problems where data are contaminated with outliers. We show several applications of our method in image analysis. ETPL Design of Low-Complexity High-Performance Wavelet Filters for Image Analysis DIP-162 Abstract: This paper addresses the construction of a family of wavelets based on halfband polynomials. An algorithm is proposed that ensures maximum zeros at for a desired length of analysis and synthesis filters. We start with the coefficients of the polynomial and then use a generalized matrix formulation method to construct the filter halfband polynomial. The designed wavelets are efficient and give acceptable levels of peak signal-to-noise ratio when used for image compression. Furthermore, these wavelets give satisfactory recognition rates when used for feature extraction. Simulation results show that the designed wavelets are effective and more efficient than the existing standard wavelets. ETPL Noise Reduction Based on Partial-Reference, Dual-Tree Complex Wavelet Transform DIP-163 Shrinkage Abstract: This paper presents a novel way to reduce noise introduced or exacerbated by image enhancement methods, in particular algorithms based on the random spray sampling technique, but not only. According to the nature of sprays, output images of spray-based methods tend to exhibit noise with unknown statistical distribution. To avoid inappropriate assumptions on the statistical characteristics of noise, a different one is made. In fact, the non-enhanced image is considered to be either free of noise or affected by non-perceivable levels of noise. Taking advantage of the higher sensitivity of the human visual system to changes in brightness, the analysis can be limited to the luma channel of both the nonenhanced and enhanced image. Also, given the importance of directional content in human vision, the analysis is performed through the dual-tree complex wavelet transform (DTWCT). Unlike the discrete wavelet transform, the DTWCT allows for distinction of data directionality in the transform space. For each level of the transform, the standard deviation of the non-enhanced image coefficients is computed across the six orientations of the DTWCT, then it is normalized. The result is a map of the directional structures present in the non-enhanced image. Said map is then used to shrink the coefficients of the enhanced image. The shrunk coefficients and the coefficients from the non-enhanced image are then mixed according to data directionality. Finally, a noise-reduced version of the enhanced image is computed via the inverse transforms. A thorough numerical analysis of the results has been performed in order to confirm the validity of the proposed approach. ETPL Hessian Schatten-Norm Regularization for Linear Inverse Problems DIP-164 Abstract: We introduce a novel family of invariant, convex, and non-quadratic functionals that we employ to derive regularized solutions of ill-posed linear inverse imaging problems. The proposed regularizers involve the Schatten norms of the Hessian matrix, which are computed at every pixel of the image. They can be viewed as second-order extensions of the popular total-variation (TV) semi-norm since they satisfy the same invariance properties. Meanwhile, by taking advantage of second-order derivatives, they avoid the staircase effect, a common artifact of TV-based reconstructions, and perform well for a wide range of applications. To solve the corresponding optimization problems, we propose an algorithm that is based on a primal-dual formulation. A fundamental ingredient of this algorithm is the projection of matrices onto Schatten norm balls of arbitrary radius. This operation is performed efficiently based on a direct link we provide between vector projections onto norm balls and matrix projections onto Schatten norm balls. Finally, we demonstrate the effectiveness of the proposed methods through

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
experimental results on several inverse imaging problems with real and simulated data. ETPL Structured Sparse Error Coding for Face Recognition With Occlusion DIP-165 Abstract: Face recognition with occlusion is common in the real world. Inspired by the works of structured sparse representation, we try to explore the structure of the error incurred by occlusion from two aspects: the error morphology and the error distribution. Since human beings recognize the occlusion mainly according to its region shape or profile without knowing accurately what the occlusion is, we argue that the shape of the occlusion is also an important feature. We propose a morphological graph model to describe the morphological structure of the error. Due to the uncertainty of the occlusion, the distribution of the error incurred by occlusion is also uncertain. However, we observe that the unoccluded part and the occluded part of the error measured by the correntropy induced metric follow the exponential distribution, respectively. Incorporating the two aspects of the error structure, we propose the structured sparse error coding for face recognition with occlusion. Our extensive experiments demonstrate that the proposed method is more stable and has higher breakdown point in dealing with the occlusion problems in face recognition as compared to the related state-of-the-art methods, especially for the extreme situation, such as the high level occlusion and the low feature dimension. ETPL Accurate Multiple View 3D Reconstruction Using Patch-Based Stereo for Large-Scale DIP-166 Scenes Abstract: In this paper, we propose a depth-map merging based multiple view stereo method for largescale scenes which takes both accuracy and efficiency into account. In the proposed method, an efficient patch-based stereo matching process is used to generate depth-map at each image with acceptable errors, followed by a depth-map refinement process to enforce consistency over neighboring views. Compared to state-of-the-art methods, the proposed method can reconstruct quite accurate and dense point clouds with high computational efficiency. Besides, the proposed method could be easily parallelized at image level, i.e., each depth-map is computed individually, which makes it suitable for large-scale scene reconstruction with high resolution images. The accuracy and efficiency of the proposed method are evaluated quantitatively on benchmark data and qualitatively on large data sets. ETPL Mixed-Domain Edge-Aware Image Manipulation DIP-167 Abstract: This paper presents a novel approach to edge-aware image manipulation. Our method processes a Gaussian pyramid from coarse to fine, and at each level, applies a nonlinear filter bank to the neighborhood of each pixel. Outputs of these spatially-varying filters are merged using global optimization. The optimization problem is solved using an explicit mixed-domain (real space and DCT transform space) solution, which is efficient, accurate, and easy-to-implement. We demonstrate applications of our method to a set of problems, including detail and contrast manipulation, HDR compression, nonphotorealistic rendering, and haze removal. ETPL Monocular Depth Ordering Using T-Junctions and Convexity Occlusion Cues DIP-168 Abstract: This paper proposes a system that relates objects in an image using occlusion cues and arranges them according to depth. The system does not rely on a priori knowledge of the scene structure and focuses on detecting special points, such as T-junctions and highly convex contours, to infer the depth relationships between objects in the scene. The system makes extensive use of the binary partition tree as hierarchical region-based image representation jointly with a new approach for candidate T-junction estimation. Since some regions may not involve T-junctions, occlusion is also detected by examining convex shapes on region boundaries. Combining T-junctions and convexity leads to a system which only relies on low level depth cues and does not rely on semantic information. However, it shows a similar or better performance with the state-of-the-art while not assuming any type of scene. As an extension of the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
automatic depth ordering system, a semi-automatic approach is also proposed. If the user provides the depth order for a subset of regions in the image, the system is able to easily integrate this user information to the final depth order for the complete image. For some applications, user interaction can naturally be integrated, improving the quality of the automatically generated depth map ETPL Perceptual Full-Reference Quality Assessment of Stereoscopic Images by Considering DIP-169 Binocular Visual Characteristics Abstract: Perceptual quality assessment is a challenging issue in 3D signal processing research. It is important to study 3D signal directly instead of studying simple extension of the 2D metrics directly to the 3D case as in some previous studies. In this paper, we propose a new perceptual full-reference quality assessment metric of stereoscopic images by considering the binocular visual characteristics. The major technical contribution of this paper is that the binocular perception and combination properties are considered in quality assessment. To be more specific, we first perform left-right consistency checks and compare matching error between the corresponding pixels in binocular disparity calculation, and classify the stereoscopic images into non-corresponding, binocular fusion, and binocular suppression regions. Also, local phase and local amplitude maps are extracted from the original and distorted stereoscopic images as features in quality assessment. Then, each region is evaluated independently by considering its binocular perception property, and all evaluation results are integrated into an overall score. Besides, a binocular just noticeable difference model is used to reflect the visual sensitivity for the binocular fusion and suppression regions. Experimental results show that compared with the relevant existing metrics, the proposed metric can achieve higher consistency with subjective assessment of stereoscopic images. ETPL Multi-Wiener SURE-LET Deconvolution DIP-170 Abstract: In this paper, we propose a novel deconvolution algorithm based on the minimization of a regularized Stein's unbiased risk estimate (SURE), which is a good estimate of the mean squared error. We linearly parametrize the deconvolution process by using multiple Wiener filters as elementary functions, followed by undecimated Haar-wavelet thresholding. Due to the quadratic nature of SURE and the linear parametrization, the deconvolution problem finally boils down to solving a linear system of equations, which is very fast and exact. The linear coefficients, i.e., the solution of the linear system of equations, constitute the best approximation of the optimal processing on the Wiener-Haar-threshold basis that we consider. In addition, the proposed multi-Wiener SURE-LET approach is applicable for both periodic and symmetric boundary conditions, and can thus be used in various practical scenarios. The very competitive (both in computation time and quality) results show that the proposed algorithm, which can be interpreted as a kind of nonlinear Wiener processing, can be used as a basic tool for building more sophisticated deconvolution algorithms. ETPL Joint Reconstruction of Multiview Compressed Images DIP-171 Abstract: Distributed representation of correlated multiview images is an important problem that arises in vision sensor networks. This paper concentrates on the joint reconstruction problem where the distributively compressed images are decoded together in order to take benefit from the image correlation. We consider a scenario where the images captured at different viewpoints are encoded independently using common coding solutions (e.g., JPEG) with a balanced rate distribution among different cameras. A central decoder first estimates the inter-view image correlation from the independently compressed data. The joint reconstruction is then cast as a constrained convex optimization problem that reconstructs total-variation (TV) smooth images, which comply with the estimated correlation model. At the same time, we add constraints that force the reconstructed images to be as close as possible to their compressed versions. We show through experiments that the proposed joint reconstruction scheme outperforms independent reconstruction in terms of image quality, for a given

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
target bit rate. In addition, the decoding performance of our algorithm compares advantageously to stateof-the-art distributed coding schemes based on motion learning and on the DISCOVER algorithm. ETPL Scalable Coding of Depth Maps With R-D Optimized Embedding DIP-172 Abstract: Recent work on depth map compression has revealed the importance of incorporating a description of discontinuity boundary geometry into the compression scheme. We propose a novel compression strategy for depth maps that incorporates geometry information while achieving the goals of scalability and embedded representation. Our scheme involves two separate image pyramid structures, one for breakpoints and the other for sub-band samples produced by a breakpoint-adaptive transform. Breakpoints capture geometric attributes, and are amenable to scalable coding. We develop a ratedistortion optimization framework for determining the presence and precision of breakpoints in the pyramid representation. We employ a variation of the EBCOT scheme to produce embedded bit-streams for both the breakpoint and sub-band data. Compared to JPEG 2000, our proposed scheme enables the same the scalability features while achieving substantially improved rate-distortion performance at the higher bit-rate range and comparable performance at the lower rates. ETPL Automatic Virus Particle SelectionThe Entropy Approach DIP-173 Abstract: This paper describes a fully automatic approach to locate icosahedral virus particles in transmission electron microscopy images. The initial detection of the particles takes place through automatic segmentation of the entropy-proportion image; this image is computed in particular regions of interest defined by two concentric structuring elements contained in a small overlapping window running over all the image. Morphological features help to select the candidates, as the threshold is kept low enough to avoid false negatives. The candidate points are subject to a credibility test based on features extracted from eight radial intensity profiles in each point from a texture image. A candidate is accepted if these features meet the set of acceptance conditions describing the typical intensity profiles of these kinds of particles. The set of points accepted is subjected to a last validation in a three-parameter space using a discrimination plan that is a function of the input image to separate possible outliers. ETPL A Tuned Mesh-Generation Strategy for Image Representation Based on DataDIP-174 Dependent Triangulation Abstract: A mesh-generation framework for image representation based on data-dependent triangulation is proposed. The proposed framework is a modified version of the frameworks of Rippa and Garland and Heckbert that facilitates the development of more effective mesh-generation methods. As the proposed framework has several free parameters, the effects of different choices of these parameters on mesh quality are studied, leading to the recommendation of a particular set of choices for these parameters. A mesh-generation method is then introduced that employs the proposed framework with these best parameter choices. This method is demonstrated to produce meshes of higher quality (both in terms of squared error and subjectively) than those generated by several competing approaches, at a relatively modest computational and memory cost. ETPL Accelerated Edge-Preserving Image Restoration Without Boundary Artifacts DIP-175 Abstract: To reduce blur in noisy images, regularized image restoration methods have been proposed that use nonquadratic regularizers (like l1 regularization or total-variation) that suppress noise while preserving edges in the image. Most of these methods assume a circulant blur (periodic convolution with a blurring kernel) that can lead to wraparound artifacts along the boundaries of the image due to the implied periodicity of the circulant model. Using a noncirculant model could prevent these artifacts at the cost of increased computational complexity. In this paper, we propose to use a circulant blur model combined with a masking operator that prevents wraparound artifacts. The resulting model is

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
noncirculant, so we propose an efficient algorithm using variable splitting and augmented Lagrangian (AL) strategies. Our variable splitting scheme, when combined with the AL framework and alternating minimization, leads to simple linear systems that can be solved noniteratively using fast Fourier transforms (FFTs), eliminating the need for more expensive conjugate gradient-type solvers. The proposed method can also efficiently tackle a variety of convex regularizers, including edge-preserving (e.g., total-variation) and sparsity promoting (e.g., l1-norm) regularizers. Simulation results show fast convergence of the proposed method, along with improved image quality at the boundaries where the circulant model is inaccurate. ETPL Box Relaxation Schemes in Staggered Discretizations for the Dual Formulation of DIP-176 Total Variation Minimization Abstract: In this paper, we propose some new box relaxation numerical schemes on staggered grids to solve the stationary system of partial differential equations arising from the dual minimization problem associated with the total variation operator. We present in detail the numerical schemes for the scalar case and its generalization to multichannel (vectorial) images. Then, we discuss their implementation in digital image denoising. The results outperform the resolution of the dual equation based on the gradient descent approach and pave the way for more advanced numerical strategies. ETPL Constrained Optical Flow Estimation as a Matching Problem DIP-177 Abstract: In general, discretization in the motion vector domain yields an intractable number of labels. In this paper, we propose an approach that can reduce general optical flow to the constrained matching problem by pre-estimating a 2-D disparity labeling map of the desired discrete motion vector function. One of the goals of the proposed paper is estimating coarse distribution of motion vectors and then utilizing this distribution as global constraints for discrete optical flow estimation. This pre-estimation is done with a simple frame-to-frame correlation technique also known as the digital symmetric-phase-onlyfilter (SPOF). We discover a strong correlation between the output of the SPOF and the motion vector distribution of the related optical flow. A two step matching paradigm for optical flow estimation is applied: pixel accuracy (integer flow) and subpixel accuracy estimation. The matching problem is solved by global optimization. Experiments on the Middlebury optical flow datasets confirm our intuitive assumptions about strong correlation between motion vector distribution of optical flow and maximal peaks of SPOF outputs. The overall performance of the proposed method is promising and achieves stateof-the-art results on the Middlebury benchmark ETPL Nonseparable Shearlet Transform DIP-178 Abstract: Over the past few years, various representation systems which sparsely approximate functions governed by anisotropic features, such as edges in images, have been proposed. Alongside the theoretical development of these systems, algorithmic realizations of the associated transforms are provided. However, one of the most common shortcomings of these frameworks is the lack of providing a unified treatment of the continuum and digital world, i.e., allowing a digital theory to be a natural digitization of the continuum theory. In this paper, we introduce a new shearlet transform associated with a nonseparable shearlet generator, which improves the directional selectivity of previous shearlet transforms. Our approach is based on a discrete framework, which allows a faithful digitization of the continuum domain directional transform based on compactly supported shearlets introduced as means to sparsely encode anisotropic singularities of multivariate data. We show numerical experiments demonstrating the potential of our new shearlet transform in 2D and 3D image processing applications. ETPL Modeling and Classifying Human Activities From Trajectories Using a Class of SpaceDIP-179 Varying Parametric Motion Fields Abstract: Many approaches to trajectory analysis, such as clustering or classification, use probabilistic

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
generative models, thus not requiring trajectory alignment/registration. Switched linear dynamical models (e.g., HMMs) have been used in this context, due to their ability to describe different motion regimes. However, these models are not suitable for handling space-dependent dynamics that are more naturally captured by nonlinear models. As is well known, these are more difficult to identify. In this paper, we propose a new way of modeling trajectories, based on a mixture of parametric motion vector fields that depend on a small number of parameters. Switching among these fields follows a probabilistic mechanism, characterized by a field of stochastic matrices. This approach allows representing a wide variety of trajectories and modeling space-dependent behaviors without using global nonlinear dynamical models. Experimental evaluation is conducted in both synthetic and real scenarios. The latter concerning with human trajectory modeling for activity classification, a central task in video surveillance ETPL Real-Time Continuous Image Registration Enabling Ultraprecise 2-D Motion Tracking DIP-180 Abstract: In this paper, we present a novel continuous image registration method (CIRM), which yields near-zero bias and has high computational efficiency. It can be realized for real-time position estimation to enable ultraprecise 2-D motion tracking and motion control over a large motion range. As the two variables of the method are continuous in spatial domain, pixel-level image registration is unnecessary, thus the CIRM can continuously track the moving target according to the incoming target image. When applied to a specific target object, measurement resolution of the method is predicted according to the reference image model of the object along with the variance of the camera's overall image noise. The maximum permissible target speed is proportional to the permissible frame rate, which is limited by the required computational time. The precision, measurement resolution, and computational efficiency of the method are verified through computer simulations and experiments. Specifically, the CIRM is implemented and integrated with a visual sensing system. Near-zero bias, measurement resolution of 0.1 nm (0.0008 pixels), and measurement of one nanometer stepping are demonstrated. ETPL Unified Blind Method for Multi-Image Super-Resolution and Single/Multi-Image Blur DIP-181 Deconvolution Abstract: This paper presents, for the first time, a unified blind method for multi-image super-resolution (MISR or SR), single-image blur deconvolution (SIBD), and multi-image blur deconvolution (MIBD) of low-resolution (LR) images degraded by linear space-invariant (LSI) blur, aliasing, and additive white Gaussian noise (AWGN). The proposed approach is based on alternating minimization (AM) of a new cost function with respect to the unknown high-resolution (HR) image and blurs. The regularization term for the HR image is based upon the Huber-Markov random field (HMRF) model, which is a type of variational integral that exploits the piecewise smooth nature of the HR image. The blur estimation process is supported by an edge-emphasizing smoothing operation, which improves the quality of blur estimates by enhancing strong soft edges toward step edges, while filtering out weak structures. The parameters are updated gradually so that the number of salient edges used for blur estimation increases at each iteration. For better performance, the blur estimation is done in the filter domain rather than the pixel domain, i.e., using the gradients of the LR and HR images. The regularization term for the blur is Gaussian (L2 norm), which allows for fast noniterative optimization in the frequency domain. We accelerate the processing time of SR reconstruction by separating the upsampling and registration processes from the optimization procedure. Simulation results on both synthetic and real-life images (from a novel computational imager) confirm the robustness and effectiveness of the proposed method. ETPL Informative State-Based Video Communication DIP-182 Abstract: We study state-based video communication where a client simultaneously informs the server about the presence status of various packets in its buffer. In sender-driven transmission, the client periodically sends to the server a single acknowledgement packet that provides information about all

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
packets that have arrived at the client by the time the acknowledgment is sent. In receiver-driven streaming, the client periodically sends to the server a single request packet that comprises a transmission schedule for sending missing data to the client over a horizon of time. We develop a comprehensive optimization framework that enables computing packet transmission decisions that maximize the end-toend video quality for the given bandwidth resources, in both prospective scenarios. The core step of the optimization comprises computing the probability that a single packet will be communicated in error as a function of the expected transmission redundancy (or cost) used to communicate the packet. Through comprehensive simulation experiments, we carefully examine the performance advances that our framework enables relative to state-of-the-art scheduling systems that employ regular acknowledgement or request packets. Consistent gains in video quality of up to 2B are demonstrated across a variety of content types. We show that there is a direct analogy between the error-cost efficiency of streaming a single packet and the overall rate-distortion performance of streaming the whole content. In the case of sender-driven transmission, we develop an effective modeling approach that accurately characterizes the end-to-end performance as a function of the packet loss rate on the backward channel and the source encoding characteristics ETPL Quantification of Smoothing Requirement for 3D Optic Flow Calculation of DIP-183 Volumetric Images Abstract: Complexities of dynamic volumetric imaging challenge the available computer vision techniques on a number of different fronts. This paper examines the relationship between the estimation accuracy and required amount of smoothness for a general solution from a robust statistics perspective. We show that a (surprisingly) small amount of local smoothing is required to satisfy both the necessary and sufficient conditions for accurate optic flow estimation. This notion is called just enough smoothing, and its proper implementation has a profound effect on the preservation of local information in processing 3D dynamic scans. To demonstrate the effect of just enough smoothing, a robust 3D optic flow method with quantized local smoothing is presented, and the effect of local smoothing on the accuracy of motion estimation in dynamic lung CT images is examined using both synthetic and real image sequences with ground truth. ETPL Analysis Operator Learning and its Application to Image Reconstruction DIP-184 Abstract: Exploiting a priori known structural information lies at the core of many image reconstruction methods that can be stated as inverse problems. The synthesis model, which assumes that images can be decomposed into a linear combination of very few atoms of some dictionary, is now a well established tool for the design of image reconstruction algorithms. An interesting alternative is the analysis model, where the signal is multiplied by an analysis operator and the outcome is assumed to be sparse. This approach has only recently gained increasing interest. The quality of reconstruction methods based on an analysis model severely depends on the right choice of the suitable operator. In this paper, we present an algorithm for learning an analysis operator from training images. Our method is based on lp-norm minimization on the set of full rank matrices with normalized columns. We carefully introduce the employed conjugate gradient method on manifolds, and explain the underlying geometry of the constraints. Moreover, we compare our approach to state-of-the-art methods for image denoising, inpainting, and single image super-resolution. Our numerical results show competitive performance of our general approach in all presented applications compared to the specialized state-of-the-art techniques ETPL Computational Model of Stereoscopic 3D Visual Saliency DIP-185 Abstract: Many computational models of visual attention performing well in predicting salient areas of 2D images have been proposed in the literature. The emerging applications of stereoscopic 3D display bring an additional depth of information affecting the human viewing behavior, and require extensions of

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
the efforts made in 2D visual modeling. In this paper, we propose a new computational model of visual attention for stereoscopic 3D still images. Apart from detecting salient areas based on 2D visual features, the proposed model takes depth as an additional visual dimension. The measure of depth saliency is derived from the eye movement data obtained from an eye-tracking experiment using synthetic stimuli. Two different ways of integrating depth information in the modeling of 3D visual attention are then proposed and examined. For the performance evaluation of 3D visual attention models, we have created an eye-tracking database, which contains stereoscopic images of natural content and is publicly available, along with this paper. The proposed model gives a good performance, compared to that of state-of-the-art 2D models on 2D images. The results also suggest that a better performance is obtained when depth information is taken into account through the creation of a depth saliency map, rather than when it is integrated by a weighting method. ETPL In-Plane Rotation and Scale Invariant Clustering Using Dictionaries DIP-186 Abstract: In this paper, we present an approach that simultaneously clusters images and learns dictionaries from the clusters. The method learns dictionaries and clusters images in the radon transform domain. The main feature of the proposed approach is that it provides both in-plane rotation and scale invariant clustering, which is useful in numerous applications, including content-based image retrieval (CBIR). We demonstrate the effectiveness of our rotation and scale invariant clustering method on a series of CBIR experiments. Experiments are performed on the Smithsonian isolated leaf, Kimia shape, and Brodatz texture datasets. Our method provides both good retrieval performance and greater robustness compared to standard Gabor-based and three state-of-the-art shape-based methods that have similar objectives. ETPL General Framework to Histogram-Shifting-Based Reversible Data Hiding DIP-187 Abstract: Histogram shifting (HS) is a useful technique of reversible data hiding (RDH). With HS-based RDH, high capacity and low distortion can be achieved efficiently. In this paper, we revisit the HS technique and present a general framework to construct HS-based RDH. By the proposed framework, one can get a RDH algorithm by simply designing the so-called shifting and embedding functions. Moreover, by taking specific shifting and embedding functions, we show that several RDH algorithms reported in the literature are special cases of this general construction. In addition, two novel and efficient RDH algorithms are also introduced to further demonstrate the universality and applicability of our framework. It is expected that more efficient RDH algorithms can be devised according to the proposed framework by carefully designing the shifting and embedding functions. ETPL Computationally Tractable Stochastic Image Modeling Based on Symmetric Markov DIP-188 Mesh Random Fields Abstract: In this paper, the properties of a new class of causal Markov random fields, named symmetric Markov mesh random field, are initially discussed. It is shown that the symmetric Markov mesh random fields from the upper corners are equivalent to the symmetric Markov mesh random fields from the lower corners. Based on this new random field, a symmetric, corner-independent, and isotropic image model is then derived which incorporates the dependency of a pixel on all its neighbors. The introduced image model comprises the product of several local 1D density and 2D joint density functions of pixels in an image thus making it computationally tractable and practically feasible by allowing the use of histogram and joint histogram approximations to estimate the model parameters. An image restoration application is also presented to confirm the effectiveness of the model developed. The experimental results demonstrate that this new model provides an improved tool for image modeling purposes compared to the conventional Markov random field models. ETPL Robust Ellipse Fitting Based on Sparse Combination of Data Points

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
DIP-189 Abstract: Ellipse fitting is widely applied in the fields of computer vision and automatic industry control, in which the procedure of ellipse fitting often follows the preprocessing step of edge detection in the original image. Therefore, the ellipse fitting method also depends on the accuracy of edge detection besides their own performance, especially due to the introduced outliers and edge point errors from edge detection which will cause severe performance degradation. In this paper, we develop a robust ellipse fitting method to alleviate the influence of outliers. The proposed algorithm solves ellipse parameters by linearly combining a subset of (more accurate) data points (formed from edge points) rather than all data points (which contain possible outliers). In addition, considering that squaring the fitting residuals can magnify the contributions of these extreme data points, our algorithm replaces it with the absolute residuals to reduce this influence. Moreover, the norm of data point errors is bounded, and the worst case performance optimization is formed to be robust against data point errors. The resulting mixed l1-l2 optimization problem is further derived as a second-order cone programming one and solved by the computationally efficient interior-point methods. Note that the fitting approach developed in this paper specifically deals with the overdetermined system, whereas the current sparse representation theory is only applied to underdetermined systems. Therefore, the proposed algorithm can be looked upon as an extended application and development of the sparse representation theory. Some simulated and experimental examples are presented to illustrate the effectiveness of the proposed ellipse fitting approach. ETPL Learning Dynamic Hybrid Markov Random Field for Image Labeling DIP-190 Abstract: Using shape information has gained increasing concerns in the task of image labeling. In this paper, we present a dynamic hybrid Markov random field (DHMRF), which explicitly captures middlelevel object shape and low-level visual appearance (e.g., texture and color) for image labeling. Each node in DHMRF is described by either a deformable template or an appearance model as visual prototype. On the other hand, the edges encode two types of intersections: co-occurrence and spatial layered context, with respect to the labels and prototypes of connected nodes. To learn the DHMRF model, an iterative algorithm is designed to automatically select the most informative features and estimate model parameters. The algorithm achieves high computational efficiency since a branch-and-bound schema is introduced to estimate model parameters. Compared with previous methods, which usually employ implicit shape cues, our DHMRF model seamlessly integrates color, texture, and shape cues to inference labeling output, and thus produces more accurate and reliable results. Extensive experiments validate its superiority over other state-of-the-art methods in terms of recognition accuracy and implementation efficiency on: the MSRC 21-class dataset, and the lotus hill institute 15-class dataset ETPL Coupled Variational Image Decomposition and Restoration Model for Blurred DIP-191 Cartoon-Plus-Texture Images With Missing Pixels Abstract: In this paper, we develop a decomposition model to restore blurred images with missing pixels. Our assumption is that the underlying image is the superposition of cartoon and texture components. We use the total variation norm and its dual norm to regularize the cartoon and texture, respectively. We recommend an efficient numerical algorithm based on the splitting versions of augmented Lagrangian method to solve the problem. Theoretically, the existence of a minimizer to the energy function and the convergence of the algorithm are guaranteed. In contrast to recently developed methods for deblurring images, the proposed algorithm not only gives the restored image, but also gives a decomposition of cartoon and texture parts. These two parts can be further used in segmentation and inpainting problems. Numerical comparisons between this algorithm and some state-of-the-art methods are also reported. ETPL Perceptual Quality-Regulable Video Coding System With Region-Based Rate Control DIP-192 Scheme

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
Abstract: In this paper, we discuss a region-based perceptual quality-regulable H.264 video encoder system that we developed. The ability to adjust the quality of specific regions of a source video to a predefined level of quality is an essential technique for region-based video applications. We use the structural similarity index as the quality metric for distortion-quantization modeling and develop a bit allocation and rate control scheme for enhancing regional perceptual quality. Exploiting the relationship between the reconstructed macroblock and the best predicted macroblock from mode decision, a novel quantization parameter prediction method is built and used to achieve the target video quality of the processed macroblock. Experimental results show that the system model has only 0.013 quality error in average. Moreover, the proposed region-based rate control system can encode video well under a bitrate constraint with a 0.1% bitrate error in average. For the situation of the low bitrate constraint, the proposed system can encode video with a 0.5% bit error rate in average and enhance the quality of the target regions ETPL Color and Depth Priors in Natural Images DIP-193 Abstract: Natural scene statistics have played an increasingly important role in both our understanding of the function and evolution of the human vision system, and in the development of modern image processing applications. Because range (egocentric distance) is arguably the most important thing a visual system must compute (from an evolutionary perspective), the joint statistics between image information (color and luminance) and range information are of particular interest. It seems obvious that where there is a depth discontinuity, there must be a higher probability of a brightness or color discontinuity too. This is true, but the more interesting case is in the other direction - because image information is much more easily computed than range information, the key conditional probabilities are those of finding a range discontinuity given an image discontinuity. Here, the intuition is much weaker; the plethora of shadows and textures in the natural environment imply that many image discontinuities must exist without corresponding changes in range. In this paper, we extend previous work in two ways - we use as our starting point a very high quality data set of co-registered color and range values collected specifically for this purpose, and we evaluate the statistics of perceptually relevant chromatic information in addition to luminance, range, and binocular disparity information. The most fundamental finding is that the probabilities of finding range changes do in fact depend in a useful and systematic way on color and luminance changes; larger range changes are associated with larger image changes. Second, we are able to parametrically model the prior marginal and conditional distributions of luminance, color, range, and (computed) binocular disparity. Finally, we provide a proof of principle that this information is useful by showing that our distribution models improve the performance of a Bayesian stereo algorithm on an independent set of input images. To summarize- we show that there is useful information about range in very low-level luminance and color information. To a system sensitive to this statistical information, it amounts to an additional (and only recently appreciated) depth cue, and one that is trivial to compute from the image data. We are confident that this information is robust, in that we go to great effort and expense to collect very high quality raw data. Finally, we demonstrate the practical utility of these findings by using them to improve the performance of a Bayesian stereo algorithm. ETPL Sparse Image Reconstruction on the Sphere: Implications of a New Sampling Theorem DIP-194 Abstract: We study the impact of sampling theorems on the fidelity of sparse image reconstruction on the sphere. We discuss how a reduction in the number of samples required to represent all information content of a band-limited signal acts to improve the fidelity of sparse image reconstruction, through both the dimensionality and sparsity of signals. To demonstrate this result, we consider a simple inpainting problem on the sphere and consider images sparse in the magnitude of their gradient. We develop a framework for total variation inpainting on the sphere, including fast methods to render the inpainting

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
problem computationally feasible at high resolution. Recently a new sampling theorem on the sphere was developed, reducing the required number of samples by a factor of two for equiangular sampling schemes. Through numerical simulations, we verify the enhanced fidelity of sparse image reconstruction due to the more efficient sampling of the sphere provided by the new sampling theorem. ETPL Log-Gabor Filters for Image-Based Vehicle Verification DIP-195 Abstract: Vehicle detection based on image analysis has attracted increasing attention in recent years due to its low cost, flexibility, and potential toward collision avoidance. In particular, vehicle verification is especially challenging on account of the heterogeneity of vehicles in color, size, pose, etc. Image-based vehicle verification is usually addressed as a supervised classification problem. Specifically, descriptors using Gabor filters have been reported to show good performance in this task. However, Gabor functions have a number of drawbacks relating to their frequency response. The main contribution of this paper is the proposal and evaluation of a new descriptor based on the alternative family of log-Gabor functions for vehicle verification, as opposed to existing Gabor filter-based descriptors. These filters are theoretically superior to Gabor filters as they can better represent the frequency properties of natural images. As a second contribution, and in contrast to existing approaches, which transfer the standard configuration of filters used for other applications to the vehicle classification task, an in-depth analysis of the required filter configuration by both Gabor and log-Gabor descriptors for this particular application is performed for fair comparison. The extensive experiments conducted in this paper confirm that the proposed logGabor descriptor significantly outperforms the standard Gabor filter for image-based vehicle verification. ETPL Scene Text Detection via Connected Component Clustering and Nontext Filtering DIP-196 Abstract: In this paper, we present a new scene text detection algorithm based on two machine learning classifiers: one allows us to generate candidate word regions and the other filters out nontext ones. To be precise, we extract connected components (CCs) in images by using the maximally stable extremal region algorithm. These extracted CCs are partitioned into clusters so that we can generate candidate regions. Unlike conventional methods relying on heuristic rules in clustering, we train an AdaBoost classifier that determines the adjacency relationship and cluster CCs by using their pairwise relations. Then we normalize candidate word regions and determine whether each region contains text or not. Since the scale, skew, and color of each candidate can be estimated from CCs, we develop a text/nontext classifier for normalized images. This classifier is based on multilayer perceptrons and we can control recall and precision rates with a single free parameter. Finally, we extend our approach to exploit multichannel information. Experimental results on ICDAR 2005 and 2011 robust reading competition datasets show that our method yields the state-of-the-art performance both in speed and accuracy. ETPL A Robust Method for Rotation Estimation Using Spherical Harmonics Representation DIP-197 Abstract: This paper presents a robust method for 3D object rotation estimation using spherical harmonics representation and the unit quaternion vector. The proposed method provides a closed-form solution for rotation estimation without recurrence relations or searching for point correspondences between two objects. The rotation estimation problem is casted as a minimization problem, which finds the optimum rotation angles between two objects of interest in the frequency domain. The optimum rotation angles are obtained by calculating the unit quaternion vector from a symmetric matrix, which is constructed from the two sets of spherical harmonics coefficients using eigendecomposition technique. Our experimental results on hundreds of 3D objects show that our proposed method is very accurate in rotation estimation, robust to noisy data, missing surface points, and can handle intra-class variability between 3D objects. ETPL Synthetic Aperture Radar Autofocus via Semidefinite Relaxation

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
DIP-198 Abstract: The autofocus problem in synthetic aperture radar imaging amounts to estimating unknown phase errors caused by unknown platform or target motion. At the heart of three state-of-the-art autofocus algorithms, namely, phase gradient autofocus, multichannel autofocus (MCA), and Fourier-domain multichannel autofocus (FMCA), is the solution of a constant modulus quadratic program (CMQP). Currently, these algorithms solve a CMQP by using an eigenvalue relaxation approach. We propose an alternative relaxation approach based on semidefinite programming, which has recently attracted considerable attention in other signal processing problems. Experimental results show that our proposed methods provide promising performance improvements for MCA and FMCA through an increase in computational complexity.

ETPL Regional Spatially Adaptive Total Variation Super-Resolution With Spatial DIP-199 Information Filtering and Clustering Abstract: Total variation is used as a popular and effective image prior model in the regularization-based image processing fields. However, as the total variation model favors a piecewise constant solution, the processing result under high noise intensity in the flat regions of the image is often poor, and some pseudoedges are produced. In this paper, we develop a regional spatially adaptive total variation model. Initially, the spatial information is extracted based on each pixel, and then two filtering processes are added to suppress the effect of pseudoedges. In addition, the spatial information weight is constructed and classified with k-means clustering, and the regularization strength in each region is controlled by the clustering center value. The experimental results, on both simulated and real datasets, show that the proposed approach can effectively reduce the pseudoedges of the total variation regularization in the flat regions, and maintain the partial smoothness of the high-resolution image. More importantly, compared with the traditional pixel-based spatial information adaptive approach, the proposed region-based spatial information adaptive total variation model can better avoid the effect of noise on the spatial information extraction, and maintains robustness with changes in the noise intensity in the super-resolution process. ETPL Detecting, Grouping, and Structure Inference for Invariant Repetitive Patterns in DIP-200 Images Abstract: The efficient and robust extraction of invariant patterns from an image is a long-standing problem in computer vision. Invariant structures are often related to repetitive or near-repetitive patterns. The perception of repetitive patterns in an image is strongly linked to the visual interpretation and composition of textures. Repetitive patterns are products of both repetitive structures as well as repetitive reflections or color patterns. In other words, patterns that exhibit near-stationary behavior provide rich information about objects, their shapes, and their texture in an image. In this paper, we propose a new algorithm for repetitive pattern detection and grouping. The algorithm follows the classical region growing image segmentation scheme. It utilizes a mean-shift-like dynamic to group local image patches into clusters. It exploits a continuous joint alignment to: 1) match similar patches, and 2) refine the subspace grouping. We also propose an algorithm for inferring the composition structure of the repetitive patterns. The inference algorithm constructs a data-driven structural completion field, which merges the detected repetitive patterns into specific global geometric structures. The result of higher level grouping for image patterns can be used to infer the geometry of objects and estimate the general layout of a crowded scene. ETPL Compressive Framework for Demosaicing of Natural Images DIP-201 Abstract: Typical consumer digital cameras sense only one out of three color components per image pixel. The problem of demosaicing deals with interpolating those missing color components. In this

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
paper, we present compressive demosaicing (CD), a framework for demosaicing natural images based on the theory of compressed sensing (CS). Given sensed samples of an image, CD employs a CS solver to find the sparse representation of that image under a fixed sparsifying dictionary . As opposed to state of the art CS-based demosaicing approaches, we consider a clear distinction between the interchannel (color) and interpixel correlations of natural images. Utilizing some well-known facts about the human visual system, those two types of correlations are utilized in a nonseparable format to construct the sparsifying transform . Our simulation results verify that CD performs better (both visually and in ter ms of PSNR) than leading demosaicing approaches when applied to the majority of standard test images. ETPL Locally Optimal Detection of Image Watermarks in the Wavelet Domain Using Bessel DIP-202 K Form Distribution Abstract: A uniformly most powerful watermark detector, which applies the Bessel K form (BKF) probability density function to model the noise distribution was proposed by Bian and Liang. In this paper, we derive a locally optimum (LO) detector using the same noise model. Since the literature lacks thorough discussion on the performance of the BKF-LO nonlinearities, the performance of the proposed detector is discussed in detail. First, we prove that the test statistic of the proposed detector is asymptotically Gaussian and evaluate the actual performance of the proposed detector using the receiver operating characteristic (ROC). Then, the large sample performance of the proposed detector is evaluated using asymptotic relative efficiency (ARE) and maximum ARE. The experimental results show that the proposed detector has a good performance with or without attacks in terms of its ROC curves, particularly when the watermark is weak. Therefore, the proposed method is suitable for wavelet domain watermark detection, particularly when the watermark is weak. ETPL Estimating the Granularity Coefficient of a Potts-Markov Random Field Within a DIP-203 Markov Chain Monte Carlo Algorithm Abstract: This paper addresses the problem of estimating the Potts parameter jointly with the unknown parameters of a Bayesian model within a Markov chain Monte Carlo (MCMC) algorithm. Standard MCMC methods cannot be applied to this problem because performing inference on requires computing the intractable normalizing constant of the Potts model. In the proposed MCMC method, the estimation of is conducted using a likelihood-free Metropolis-Hastings algorithm. Experimental results obtained for synthetic data show that estimating jointly with the other unknown parameters leads t o estimation results that are as good as those obtained with the actual value of . On the other hand, choosing an incorrect value of can degrade estimation performance significantly. To illustrate the interest of this method, the proposed algorithm is successfully applied to real bidimensional SAR and tridimensional ultrasound images ETPL Atmospheric Turbulence Mitigation Using Complex Wavelet-Based Fusion DIP-204 Abstract: Restoring a scene distorted by atmospheric turbulence is a challenging problem in video surveillance. The effect, caused by random, spatially varying, perturbations, makes a model-based solution difficult and in most cases, impractical. In this paper, we propose a novel method for mitigating the effects of atmospheric distortion on observed images, particularly airborne turbulence which can severely degrade a region of interest (ROI). In order to extract accurate detail about objects behind the distorting layer, a simple and efficient frame selection method is proposed to select informative ROIs only from good-quality frames. The ROIs in each frame are then registered to further reduce offsets and distortions. We solve the space-varying distortion problem using region-level fusion based on the dual tree complex wavelet transform. Finally, contrast enhancement is applied. We further propose a learningbased metric specifically for image quality assessment in the presence of atmospheric distortion. This is capable of estimating quality in both full- and no-reference scenarios. The proposed method is shown to significantly outperform existing methods, providing enhanced situational awareness in a range of

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
surveillance scenarios. ETPL Rotation Invariant Local Frequency Descriptors for Texture Classification DIP-205 Abstract: This paper presents a novel rotation invariant method for texture classification based on local frequency components. The local frequency components are computed by applying 1-D Fourier transform on a neighboring function defined on a circle of radius R at each pixel. We observed that the low frequency components are the major constituents of the circular functions and can effectively represent textures. Three sets of features are extracted from the low frequency components, two based on the phase and one based on the magnitude. The proposed features are invariant to rotation and linear changes of illumination. Moreover, by using low frequency components, the proposed features are very robust to noise. While the proposed method uses a relatively small number of features, it outperforms state-of-theart methods in three well-known datasets: Brodatz, Outex, and CUReT. In addition, the proposed method is very robust to noise and can remarkably improve the classification accuracy especially in the presence of high levels of noise. ETPL Scanned Document Compression Using Block-Based Hybrid Video Codec DIP-206 Abstract: This paper proposes a hybrid pattern matching/transform-based compression method for scanned documents. The idea is to use regular video interframe prediction as a pattern matching algorithm that can be applied to document coding. We show that this interpretation may generate residual data that can be efficiently compressed by a transform-based encoder. The efficiency of this approach is demonstrated using H.264/advanced video coding (AVC) as a high-quality single and multipage document compressor. The proposed method, called advanced document coding (ADC), uses segments of the originally independent scanned pages of a document to create a video sequence, which is then encoded through regular H.264/AVC. The encoding performance is unrivaled. Results show that ADC outperforms AVC-I (H.264/AVC operating in pure intramode) and JPEG2000 by up to 2.7 and 6.2 dB, respectively. Superior subjective quality is also achieved. ETPL Space-Time Hole Filling With Random Walks in View Extrapolation for 3D Video DIP-207 Abstract: In this paper, a space-time hole filling approach is presented to deal with a disocclusion when a view is synthesized for the 3D video. The problem becomes even more complicated when the view is extrapolated from a single view, since the hole is large and has no stereo depth cues. Although many techniques have been developed to address this problem, most of them focus only on view interpolation. We propose a space-time joint filling method for color and depth videos in view extrapolation. For proper texture and depth to be sampled in the following hole filling process, the background of a scene is automatically segmented by the random walker segmentation in conjunction with the hole formation process. Then, the patch candidate selection process is formulated as a labeling problem, which can be solved with random walks. The patch candidates that best describe the hole region are dynamically selected in the space-time domain, and the hole is filled with the optimal patch for ensuring both spatial and temporal coherence. The experimental results show that the proposed method is superior to state-ofthe-art methods and provides both spatially and temporally consistent results with significantly reduced flicker artifacts. ETPL Rate Control for Consistent Objective Quality in High Efficiency Video Coding DIP-208 Abstract: Since video quality fluctuation degrades the visual perception significantly in multimedia communication systems, it is important to maintain a consistent objective quality over the entire video sequence. We propose a rate control algorithm to keep the consistent objective quality in high efficiency video coding (HEVC), which is an upcoming standard video codec. In the proposed algorithm, the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
probability density function of transformed coefficients is modeled based on a Laplacian function that considers the quadtree coding unit structure, which is one of the characteristics of HEVC. In controlling the video quality, distortion-quantization and rate-quantization models are derived by using the Laplacian function. Based on those models, a quantization parameter is determined to control the quality of the encoded frames where the fluctuation of video quality is minimized and the overflow and underflow of buffer are prevented. From the simulation results, it is shown that the proposed rate control algorithm outperforms the other conventional schemes. ETPL Discrete Wavelet Transform and Data Expansion Reduction in Homomorphic DIP-209 Encrypted Domain Abstract: Signal processing in the encrypted domain is a new technology with the goal of protecting valuable signals from insecure signal processing. In this paper, we propose a method for implementing discrete wavelet transform (DWT) and multiresolution analysis (MRA) in homomorphic encrypted domain. We first suggest a framework for performing DWT and inverse DWT (IDWT) in the encrypted domain, then conduct an analysis of data expansion and quantization errors under the framework. To solve the problem of data expansion, which may be very important in practical applications, we present a method for reducing data expansion in the case that both DWT and IDWT are performed. With the proposed method, multilevel DWT/IDWT can be performed with less data expansion in homomorphic encrypted domain. We propose a new signal processing procedure, where the multiplicative inverse method is employed as the last step to limit the data expansion. Taking a 2-D Haar wavelet transform as an example, we conduct a few experiments to demonstrate the advantages of our method in secure image processing. We also provide computational complexity analyses and comparisons. To the best of our knowledge, there has been no report on the implementation of DWT and MRA in the encrypted domain. ETPL QoE-Based Multi-Exposure Fusion in Hierarchical Multivariate Gaussian CRF DIP-210 Abstract: Many state-of-the-art fusion methods, combining details in images taken under different exposures into one well-exposed image, can be found in the literature. However, insufficient study has been conducted to explore how perceptual factors can provide viewers better quality of experience on fused images. We propose two perceptual quality measures: perceived local contrast and color saturation, which are embedded in our novel hierarchical multivariate Gaussian conditional random field model, to illustrate improved performance for multi-exposure fusion. We show that our method generates images with better quality than existing methods for a variety of scenes. ETPL Action Recognition From Video Using Feature Covariance Matrices DIP-211 Abstract: We propose a general framework for fast and accurate recognition of actions in video using empirical covariance matrices of features. A dense set of spatio-temporal feature vectors are computed from video to provide a localized description of the action, and subsequently aggregated in an empirical covariance matrix to compactly represent the action. Two supervised learning methods for action recognition are developed using feature covariance matrices. Common to both methods is the transformation of the classification problem in the closed convex cone of covariance matrices into an equivalent problem in the vector space of symmetric matrices via the matrix logarithm. The first method applies nearest-neighbor classification using a suitable Riemannian metric for covariance matrices. The second method approximates the logarithm of a query covariance matrix by a sparse linear combination of the logarithms of training covariance matrices. The action label is then determined from the sparse coefficients. Both methods achieve state-of-the-art classification performance on several datasets, and are robust to action variability, viewpoint changes, and low object resolution. The proposed framework is conceptually simple and has low storage and computational requirements making it attractive for realtime implementation.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL 2-D Wavelet Packet Spectrum for Texture Analysis DIP-212 Abstract: This brief derives a 2-D spectrum estimator from some recent results on the statistical properties of wavelet packet coefficients of random processes. It provides an analysis of the bias of this estimator with respect to the wavelet order. This brief also discusses the performance of this wavelet-based estimator, in comparison with the conventional 2-D Fourier-based spectrum estimator on texture analysis and content-based image retrieval. It highlights the effectiveness of the wavelet-based spectrum estimation. ETPL UND: Unite-and-Divide Method in Fourier and Radon Domains for Line Segment DIP-213 Detection Abstract: In this paper, we extend our previously proposed line detection method to line segmentation using a so-called unite-and-divide (UND) approach. The methodology includes two phases, namely the union of spectra in the frequency domain, and the division of the sinogram in Radon space. In the union phase, given an image, its sinogram is obtained by parallel 2D multilayer Fourier transforms, Cartesianto-polar mapping and 1D inverse Fourier transform. In the division phase, the edges of butterfly wings in the neighborhood of every sinogram peak are firstly specified, with each neighborhood area corresponding to a window in image space. By applying the separated sinogram of each such windowed image, we can extract the line segments. The division Phase identifies the edges of butterfly wings in the neighborhood of every sinogram peak such that each neighborhood area corresponds to a window in image space. Line segments are extracted by applying the separated sinogram of each windowed image. Our experiments are conducted on benchmark images and the results reveal that the UND method yields higher accuracy, has lower computational cost and is more robust to noise, compared to existing state-ofthe-art methods. ETPL Stable Orthogonal Local Discriminant Embedding for Linear Dimensionality DIP-214 Reduction Abstract: Manifold learning is widely used in machine learning and pattern recognition. However, manifold learning only considers the similarity of samples belonging to the same class and ignores the within-class variation of data, which will impair the generalization and stableness of the algorithms. For this purpose, we construct an adjacency graph to model the intraclass variation that characterizes the most important properties, such as diversity of patterns, and then incorporate the diversity into the discriminant objective function for linear dimensionality reduction. Finally, we introduce the orthogonal constraint for the basis vectors and propose an orthogonal algorithm called stable orthogonal local discriminate embedding. Experimental results on several standard image databases demonstrate the effectiveness of the proposed dimensionality reduction approach ETPL Motion-Aware Gradient Domain Video Composition DIP-215 Abstract: For images, gradient domain composition methods like Poisson blending offer practical solutions for uncertain object boundaries and differences in illumination conditions. However, adapting Poisson image blending to video presents new challenges due to the added temporal dimension. In video, the human eye is sensitive to small changes in blending boundaries across frames and slight differences in motions of the source patch and target video. We present a novel video blending approach that tackles these problems by merging the gradient of source and target videos and optimizing a consistent blending boundary based on a user-provided blending trimap for the source video. Our approach extends meanvalue coordinates interpolation to support hybrid blending with a dynamic boundary while maintaining interactive performance. We also provide a user interface and source object positioning method that can efficiently deal with complex video sequences beyond the capabilities of alpha blending. ETPL Structural Texture Similarity Metrics for Image Analysis and Retrieval

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
DIP-216 Abstract: We develop new metrics for texture similarity that accounts for human visual perception and the stochastic nature of textures. The metrics rely entirely on local image statistics and allow substantial point-by-point deviations between textures that according to human judgment are essentially identical. The proposed metrics extend the ideas of structural similarity and are guided by research in texture analysis-synthesis. They are implemented using a steerable filter decomposition and incorporate a concise set of subband statistics, computed globally or in sliding windows. We conduct systematic tests to investigate metric performance in the context of known-item search, the retrieval of textures that are identical to the query texture. This eliminates the need for cumbersome subjective tests, thus enabling comparisons with human performance on a large database. Our experimental results indicate that the proposed metrics outperform peak signal-to-noise ratio (PSNR), structural similarity metric (SSIM) and its variations, as well as state-of-the-art texture classification metrics, using standard statistical measures. ETPL Simultaneous Facial Feature Tracking and Facial Expression Recognition DIP-217 Abstract: The tracking and recognition of facial activities from images or videos have attracted great attention in computer vision field. Facial activities are characterized by three levels. First, in the bottom level, facial feature points around each facial component, i.e., eyebrow, mouth, etc., capture the detailed face shape information. Second, in the middle level, facial action units, defined in the facial action coding system, represent the contraction of a specific set of facial muscles, i.e., lid tightener, eyebrow raiser, etc. Finally, in the top level, six prototypical facial expressions represent the global facial muscle movement and are commonly used to describe the human emotion states. In contrast to the mainstream approaches, which usually only focus on one or two levels of facial activities, and track (or recognize) them separately, this paper introduces a unified probabilistic framework based on the dynamic Bayesian network to simultaneously and coherently represent the facial evolvement in different levels, their interactions and their observations. Advanced machine learning methods are introduced to learn the model based on both training data and subjective prior knowledge. Given the model and the measurements of facial motions, all three levels of facial activities are simultaneously recognized through a probabilistic inference. Extensive experiments are performed to illustrate the feasibility and effectiveness of the proposed model on all three level facial activities. ETPL A Generalized Random Walk With Restart and its Application in Depth Up-Sampling DIP-218 and Interactive Segmentation Abstract: In this paper, the origin of random walk with restart (RWR) and its generalization are described. It is well known that the random walk (RW) and the anisotropic diffusion models share the same energy functional, i.e., the former provides a steady-state solution and the latter gives a flow solution. In contrast, the theoretical background of the RWR scheme is different from that of the diffusion-reaction equation, although the restarting term of the RWR plays a role similar to the reaction term of the diffusion-reaction equation. The behaviors of the two approaches with respect to outliers reveal that they possess different attributes in terms of data propagation. This observation leads to the derivation of a new energy functional, where both volumetric heat capacity and thermal conductivity are considered together, and provides a common framework that unifies both the RW and the RWR approaches, in addition to other regularization methods. The proposed framework allows the RWR to be generalized (GRWR) in semilocal and nonlocal forms. The experimental results demonstrate the superiority of GRWR over existing regularization approaches in terms of depth map up-sampling and interactive image segmentation. ETPL Variational Optical Flow Estimation Based on Stick Tensor Voting DIP-219 Abstract: Variational optical flow techniques allow the estimation of flow fields from spatio-temporal

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
derivatives. They are based on minimizing a functional that contains a data term and a regularization term. Recently, numerous approaches have been presented for improving the accuracy of the estimated flow fields. Among them, tensor voting has been shown to be particularly effective in the preservation of flow discontinuities. This paper presents an adaptation of the data term by using anisotropic stick tensor voting in order to gain robustness against noise and outliers with significantly lower computational cost than (full) tensor voting. In addition, an anisotropic complementary smoothness term depending on directional information estimated through stick tensor voting is utilized in order to preserve discontinuity capabilities of the estimated flow fields. Finally, a weighted non-local term that depends on both the estimated directional information and the occlusion state of pixels is integrated during the optimization process in order to denoise the final flow field. The proposed approach yields state-of-the-art results on the Middlebury benchmark. ETPL Exploring Visual and Motion Saliency for Automatic Video Object Extraction DIP-220 Abstract: This paper presents a saliency-based video object extraction (VOE) framework. The proposed framework aims to automatically extract foreground objects of interest without any user interaction or the use of any training data (i.e., not limited to any particular type of object). To separate foreground and background regions within and across video frames, the proposed method utilizes visual and motion saliency information extracted from the input video. A conditional random field is applied to effectively combine the saliency induced features, which allows us to deal with unknown pose and scale variations of the foreground object (and its articulated parts). Based on the ability to preserve both spatial continuity and temporal consistency in the proposed VOE framework, experiments on a variety of videos verify that our method is able to produce quantitatively and qualitatively satisfactory VOE results. ETPL Enhanced Compressed Sensing Recovery With Level Set Normals DIP-221 Abstract: We propose a compressive sensing algorithm that exploits geometric properties of images to recover images of high quality from few measurements. The image reconstruction is done by iterating the two following steps: 1) estimation of normal vectors of the image level curves, and 2) reconstruction of an image fitting the normal vectors, the compressed sensing measurements, and the sparsity constraint. The proposed technique can naturally extend to nonlocal operators and graphs to exploit the repetitive nature of textured images to recover fine detail structures. In both cases, the problem is reduced to a series of convex minimization problems that can be efficiently solved with a combination of variable splitting and augmented Lagrangian methods, leading to fast and easy-to-code algorithms. Extended experiments show a clear improvement over related state-of-the-art algorithms in the quality of the reconstructed images and the robustness of the proposed method to noise, different kind of images, and reduced measurements. ETPL Colorization-Based Compression Using Optimization DIP-222 Abstract: In this paper, we formulate the colorization-based coding problem into an optimization problem, i.e., an L1 minimization problem. In colorization-based coding, the encoder chooses a few representative pixels (RP) for which the chrominance values and the positions are sent to the decoder, whereas in the decoder, the chrominance values for all the pixels are reconstructed by colorization methods. The main issue in colorization-based coding is how to extract the RP well therefore the compression rate and the quality of the reconstructed color image becomes good. By formulating the colorization-based coding into an L1 minimization problem, it is guaranteed that, given the colorization matrix, the chosen set of RP becomes the optimal set in the sense that it minimizes the error between the original and the reconstructed color image. In other words, for a fixed error value and a given colorization matrix, the chosen set of RP is the smallest set possible. We also propose a method to construct the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
colorization matrix that colorizes the image in a multiscale manner. This, combined with the proposed RP extraction method, allows us to choose a very small set of RP. It is shown experimentally that the proposed method outperforms conventional colorization-based coding methods as well as the JPEG standard and is comparable with the JPEG2000 compression standard, both in terms of the compression rate and the quality of the reconstructed color image. ETPL Orientation Imaging Microscopy With Optimized Convergence Angle Using CBED DIP-223 Patterns in TEMs Abstract: Grain size statistics, texture, and grain boundary distribution are microstructural characteristics that greatly influence materials properties. These characteristics can be derived from an orientation map obtained using orientation imaging microscopy (OIM) techniques. The OIM techniques are generally performed using a transmission electron microscopy (TEM) for nanomaterials. Although some of these techniques have limited applicability in certain situations, others have limited availability because of external hardware required. In this paper, an automated method to generate orientation maps using convergence beam electron diffraction patterns obtained in a conventional TEM setup is presented. This method is based upon dynamical diffraction theory that describes electron diffraction more accurately as compared with kinematical theory used by several existing OIM techniques. In addition, the method of this paper uses wide angle convergent beam electron diffraction for performing OIM. It is shown in this paper that the use of the wide angle convergent electron beam provides additional information that is not available otherwise. Together, the presented method exploits the additional information and combines it with the calculations from the dynamical theory to provide accurate orientation maps in a conventional TEM setup. The automated method of this paper is applied to a platinum thin film sample. The presented method correctly identified the texture preference in the sample. ETPL Grassmannian Regularized Structured Multi-View Embedding for Image DIP-224 Classification Abstract: Images are usually represented by features from multiple views, e.g., color and texture. In image classification, the goal is to fuse all the multi-view features in a reasonable manner and achieve satisfactory classification performance. However, the features are often different in nature and it is nontrivial to fuse them. Particularly, some extracted features are redundant or noisy and are consequently not discriminative for classification. To alleviate these problems in an image classification context, we propose in this paper a novel multi-view embedding framework, termed as Grassmannian regularized structured multi-view embedding, or GrassReg for short. GrassReg transfers the graph Laplacian obtained from each view to a point on the Grassmann manifold and penalizes the disagreement between different views according to Grassmannian distance. Therefore, a view that is consistent with others is more important than a view that disagrees with others for learning a unified subspace for multi-view data representation. In addition, we impose the group sparsity penalty onto the low-dimensional embeddings obtained hence they can better explore the group structure of the intrinsic data distribution. Empirically, we compare GrassReg with representative multi-view algorithms and show the effectiveness of GrassReg on a number of multi-view image data sets. ETPL Efficient Minimum Error Bounded Particle Resampling L1 Tracker With Occlusion DIP-225 Detection Abstract: Recently, sparse representation has been applied to visual tracking to find the target with the minimum reconstruction error from a target template subspace. Though effective, these L1 trackers require high computational costs due to numerous calculations for l1 minimization. In addition, the inherent occlusion insensitivity of the l1 minimization has not been fully characterized. In this paper, we propose an efficient L1 tracker, named bounded particle resampling (BPR)-L1 tracker, with a minimum error bound and occlusion detection. First, the minimum error bound is calculated from a linear least squares equation and serves as a guide for particle resampling in a particle filter (PF) framework. Most of

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
the insignificant samples are removed before solving the computationally expensive l1 minimization in a two-step testing. The first step, named testing, compares the sample observation likelihood to an ordered set of thresholds to remove insignificant samples without loss of resampling precision. The second step, named max testing, identifies the largest sample probability relative to the target to further remove insignificant samples without altering the tracking result of the current frame. Though sacrificing minimal precision during resampling, max testing achieves significant speed up on top of testing. The BPR-L1 technique can also be beneficial to other trackers that have minimum error bounds in a PF framework, especially for trackers based on sparse representations. After the error-bound calculation, BPR-L1 performs occlusion detection by investigating the trivial coefficients in the l1 minimization. These coefficients, by design, contain rich information about image corruptions, including occlusion. Detected occlusions are then used to enhance the template updating. For evaluation, we conduct experiments on three video applications: biometrics (head movement, hand hold- ng object, singers on stage), pedestrians (urban travel, hallway monitoring), and cars in traffic (wide area motion imagery, ground-mounted perspectives). The proposed BPR-L1 method demonstrates an excellent performance as compared with nine state-of-the-art trackers on eleven challenging benchmark sequences. ETPL Multiview Hessian Regularization for Image Annotation DIP-226 Abstract: The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape, and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LRbased image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR. ETPL GPU Accelerated Edge-Region Based Level Set Evolution Constrained by 2D GrayDIP-227 Scale Histogram Abstract: Due to its intrinsic nature which allows to easily handle complex shapes and topological changes, the level set method (LSM) has been widely used in image segmentation. Nevertheless, LSM is computationally expensive, which limits its applications in real-time systems. For this purpose, we propose a new level set algorithm, which uses simultaneously edge, region, and 2D histogram information in order to efficiently segment objects of interest in a given scene. The computational complexity of the proposed LSM is greatly reduced by using the highly parallelizable lattice Boltzmann method (LBM) with a body force to solve the level set equation (LSE). The body force is the link with image data and is defined from the proposed LSE. The proposed LSM is then implemented using an NVIDIA graphics processing units to fully take advantage of the LBM local nature. The new algorithm is effective, robust against noise, independent to the initial contour, fast, and highly parallelizable. The edge and region information enable to detect objects with and without edges, and the 2D histogram information enable the effectiveness of the method in a noisy environment. Experimental results on synthetic and real images demonstrate subjectively and objectively the performance of the proposed

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
method. ETPL Sparse Stochastic Processes and Discretization of Linear Inverse Problems DIP-228 Abstract: We present a novel statistically-based discretization paradigm and derive a class of maximum a posteriori (MAP) estimators for solving ill-conditioned linear inverse problems. We are guided by the theory of sparse stochastic processes, which specifies continuous-domain signals as solutions of linear stochastic differential equations. Accordingly, we show that the class of admissible priors for the discretized version of the signal is confined to the family of infinitely divisible distributions. Our estimators not only cover the well-studied methods of Tikhonov and l1-type regularizations as particular cases, but also open the door to a broader class of sparsity-promoting regularization schemes that are typically nonconvex. We provide an algorithm that handles the corresponding nonconvex problems and illustrate the use of our formalism by applying it to deconvolution, magnetic resonance imaging, and Xray tomographic reconstruction problems. Finally, we compare the performance of estimators associated with models of increasing sparsity. ETPL Sparse/DCT (S/DCT) Two-Layered Representation of Prediction Residuals for Video DIP-229 Coding Abstract: In this paper, we propose a cascaded sparse/DCT (S/DCT) two-layer representation of prediction residuals, and implement this idea on top of the state-of-the-art high efficiency video coding (HEVC) standard. First, a dictionary is adaptively trained to contain featured patterns of residual signals so that a high portion of energy in a structured residual can be efficiently coded via sparse coding. It is observed that the sparse representation alone is less effective in the R-D performance due to the side information overhead at higher bit rates. To overcome this problem, the DCT representation is cascaded at the second stage. It is applied to the remaining signal to improve coding efficiency. The two representations successfully complement each other. It is demonstrated by experimental results that the proposed algorithm outperforms the HEVC reference codec HM5.0 in the Common Test Condition. ETPL Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space DIP-230 Abstract: Meaningful representation and effective retrieval of video shots in a large-scale database has been a profound challenge for the image/video processing and computer vision communities. A great deal of effort has been devoted to the extraction of low-level visual features, such as color, shape, texture, and motion for characterizing and retrieving video shots. However, the accuracy of these feature descriptors is still far from satisfaction due to the well-known semantic gap. In order to alleviate the problem, this paper investigates a novel methodology of representing and retrieving video shots using human-centric highlevel features derived in brain imaging space (BIS) where brain responses to natural stimulus of video watching can be explored and interpreted. At first, our recently developed dense individualized and common connectivity-based cortical landmarks (DICCCOL) system is employed to locate large-scale functional brain networks and their regions of interests (ROIs) that are involved in the comprehension of video stimulus. Then, functional connectivities between various functional ROI pairs are utilized as BIS features to characterize the brain's comprehension of video semantics. Then an effective feature selection procedure is applied to learn the most relevant features while removing redundancy, which results in the formation of the final BIS features. Afterwards, a mapping from low-level visual features to high-level semantic features in the BIS is built via the Gaussian process regression (GPR) algorithm, and a manifold structure is then inferred, in which video key frames are represented by the mapped feature vectors in the BIS. Finally, the manifold-ranking algorithm concerning the relationship among all data is applied to measure the similarity between key frames of video shots. Experimental results on the TRECVID 2005 dataset demonstrate the superiority of the proposed work in comparison with traditional methods. ETPL Multivariate Slow Feature Analysis and Decorrelation Filtering for Blind Source

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
DIP-231 Separation

Abstract: We generalize the method of Slow Feature Analysis (SFA) for vector-valued functions of several variables and apply it to the problem of blind source separation, in particular to image separation. It is generally necessary to use multivariate SFA instead of univariate SFA for separating multidimensional signals. For the linear case, an exact mathematical analysis is given, which shows in particular that the sources are perfectly separated by SFA if and only if they and their first-order derivatives are uncorrelated. When the sources are correlated, we apply the following technique called Decorrelation Filtering: use a linear filter to decorrelate the sources and their derivatives in the given mixture, then apply the unmixing matrix obtained on the filtered mixtures to the original mixtures. If the filtered sources are perfectly separated by this matrix, so are the original sources. A decorrelation filter can be numerically obtained by solving a nonlinear optimization problem. This technique can also be applied to other linear separation methods, whose output signals are decorrelated, such as ICA. When there are more mixtures than sources, one can determine the actual number of sources by using a regularized version of SFA with decorrelation filtering. Extensive numerical experiments using SFA and ICA with decorrelation filtering, supported by mathematical analysis, demonstrate the potential of our methods for solving problems involving blind source separation. ETPL Parameter Estimation for Blind and Non-Blind Deblurring Using Residual Whiteness DIP-232 Measures Abstract: Image deblurring (ID) is an ill-posed problem typically addressed by using regularization, or prior knowledge, on the unknown image (and also on the blur operator, in the blind case). ID is often formulated as an optimization problem, where the objective function includes a data term encouraging the estimated image (and blur, in blind ID) to explain the observed data well (typically, the squared norm of a residual) plus a regularizer that penalizes solutions deemed undesirable. The performance of this approach depends critically (among other things) on the relative weight of the regularizer (the regularization parameter) and on the number of iterations of the algorithm used to address the optimization problem. In this paper, we propose new criteria for adjusting the regularization parameter and/or the number of iterations of ID algorithms. The rationale is that if the recovered image (and blur, in blind ID) is well estimated, the residual image is spectrally white; contrarily, a poorly deblurred image typically exhibits structured artifacts (e.g., ringing, oversmoothness), yielding residuals that are not spectrally white. The proposed criterion is particularly well suited to a recent blind ID algorithm that uses continuation, i.e., slowly decreases the regularization parameter along the iterations; in this case, choosing this parameter and deciding when to stop are one and the same thing. Our experiments show that the proposed whiteness-based criteria yield improvements in SNR, on average, only 0.15 dB below those obtained by (clairvoyantly) stopping the algorithm at the best SNR. We also illustrate the proposed criteria on non-blind ID, reporting results that are competitive with state-of-the-art criteria (such as Monte Carlo-based GSURE and projected SURE), which, however, are not applicable for blind ID. ETPL Image Processing Using Smooth Ordering of its Patches DIP-233 Abstract: We propose an image processing scheme based on reordering of its patches. For a given corrupted image, we extract all patches with overlaps, refer to these as coordinates in high-dimensional space, and order them such that they are chained in the shortest possible path, essentially solving the traveling salesman problem. The obtained ordering applied to the corrupted image implies a permutation of the image pixels to what should be a regular signal. This enables us to obtain good recovery of the clean image by applying relatively simple one-dimensional smoothing operations (such as filtering or interpolation) to the reordered set of pixels. We explore the use of the proposed approach to image denoising and inpainting, and show promising results in both cases.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Recursive Histogram Modification: Establishing Equivalency Between Reversible Data DIP-234 Hiding and Lossless Data Compression Abstract: State-of-the-art schemes for reversible data hiding (RDH) usually consist of two steps: first construct a host sequence with a sharp histogram via prediction errors, and then embed messages by modifying the histogram with methods, such as difference expansion and histogram shift. In this paper, we focus on the second stage, and propose a histogram modification method for RDH, which embeds the message by recursively utilizing the decompression and compression processes of an entropy coder. We prove that, for independent identically distributed (i.i.d.) gray-scale host signals, the proposed method asymptotically approaches the rate-distortion bound of RDH as long as perfect compression can be realized, i.e., the entropy coder can approach entropy. Therefore, this method establishes the equivalency between reversible data hiding and lossless data compression. Experiments show that this coding method can be used to improve the performance of previous RDH schemes and the improvements are more significant for larger images. ETPL Optical Flow Estimation for Flame Detection in Videos DIP-235 Abstract: Computational vision-based flame detection has drawn significant attention in the past decade with camera surveillance systems becoming ubiquitous. Whereas many discriminating features, such as color, shape, texture, etc., have been employed in the literature, this paper proposes a set of motion features based on motion estimators. The key idea consists of exploiting the difference between the turbulent, fast, fire motion, and the structured, rigid motion of other objects. Since classical optical flow methods do not model the characteristics of fire motion (e.g., non-smoothness of motion, non-constancy of intensity), two optical flow methods are specifically designed for the fire detection task: optimal mass transport models fire with dynamic texture, while a data-driven optical flow scheme models saturated flames. Then, characteristic features related to the flow magnitudes and directions are computed from the flow fields to discriminate between fire and non-fire motion. The proposed features are tested on a large video database to demonstrate their practical usefulness. Moreover, a novel evaluation method is proposed by fire simulations that allow for a controlled environment to analyze parameter influences, such as flame saturation, spatial resolution, frame rate, and random noise. ETPL Image Sharpness Assessment Based on Local Phase Coherence DIP-236 Abstract: Sharpness is an important determinant in visual assessment of image quality. The human visual system is able to effortlessly detect blur and evaluate sharpness of visual images, but the underlying mechanism is not fully understood. Existing blur/sharpness evaluation algorithms are mostly based on edge width, local gradient, or energy reduction of global/local high frequency content. Here we understand the subject from a different perspective, where sharpness is identified as strong local phase coherence (LPC) near distinctive image features evaluated in the complex wavelet transform domain. Previous LPC computation is restricted to be applied to complex coefficients spread in three consecutive dyadic scales in the scale-space. Here we propose a flexible framework that allows for LPC computation in arbitrary fractional scales. We then develop a new sharpness assessment algorithm without referencing the original image. We use four subject-rated publicly available image databases to test the proposed algorithm, which demonstrates competitive performance when compared with state-of-the-art algorithms. ETPL Library-Based Illumination Synthesis for Critical CMOS Patterning DIP-237 Abstract: In optical microlithography, the illumination source for critical complementary metal-oxidesemiconductor layers needs to be determined in the early stage of a technology node with very limited design information, leading to simple binary shapes. Recently, the availability of freeform sources permits us to increase pattern fidelity and relax mask complexities with minimal insertion risks to the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
current manufacturing flow. However, source optimization across many patterns is often treated as a design-of-experiments problem, which may not fully exploit the benefits of a freeform source. In this paper, a rigorous source-optimization algorithm is presented via linear superposition of optimal sources for pre-selected patterns. We show that analytical solutions are made possible by using Hopkins formulation and quadratic programming. The algorithm allows synthesized illumination to be linked with assorted pattern libraries, which has a direct impact on design rule studies for early planning and design automation for full wafer optimization. ETPL A Variational Approach for Pan-Sharpening DIP-238 Abstract: Pan-sharpening is a process of acquiring a high resolution multispectral (MS) image by combining a low resolution MS image with a corresponding high resolution panchromatic (PAN) image. In this paper, we propose a new variational pan-sharpening method based on three basic assumptions: 1) the gradient of PAN image could be a linear combination of those of the pan-sharpened image bands; 2) the upsampled low resolution MS image could be a degraded form of the pan-sharpened image; and 3) the gradient in the spectrum direction of pan-sharpened image should be approximated to those of the upsampled low resolution MS image. An energy functional, whose minimizer is related to the best pansharpened result, is built based on these assumptions. We discuss the existence of minimizer of our energy and describe the numerical procedure based on the split Bregman algorithm. To verify the effectiveness of our method, we qualitatively and quantitatively compare it with some state-of-the-art schemes using QuickBird and IKONOS data. Particularly, we classify the existing quantitative measures into four categories and choose two representatives in each category for more reasonable quantitative evaluation. The results demonstrate the effectiveness and stability of our method in terms of the related evaluation benchmarks. Besides, the computation efficiency comparison with other variational methods also shows that our method is remarkable.

ETPL Reducing the Complexity of the N-FINDR Algorithm for Hyperspectral Image DIP-239 Analysis Abstract: The N-FINDR algorithm for unmixing hyperspectral data is both popular and successful. However, opportunities for improving the algorithm exist, particularly to reduce its computational expense. Two approaches to achieve this are examined. First, the redundancy inherent in the determinant calculations at the heart of N-FINDR is reduced using an LDU decomposition to form two new algorithms, one based on the original N-FINDR algorithm and one based on the closely related Sequential N-FINDR algorithm. The second approach lowers complexity by reducing the repetition of the volume calculations by removing pixels unlikely to represent pure materials. This is accomplished at no additional cost through the reuse of the volume calculations inherent in the Sequential N-FINDR algorithm. Various thresholding methods for excluding pixels are considered. The impact of these modifications on complexity and the accuracy is examined on simulated and real data showing that the LDU-based approaches save considerable complexity, while pixel reduction methods, with appropriate threshold selection, can produce a favorable complexity-accuracy trade-off. ETPL 3-D Curvilinear Structure Detection Filter Via Structure-Ball Analysis DIP-240 Abstract: Curvilinear structure detection filters are crucial building blocks in many medical image processing applications, where they are used to detect important structures, such as blood vessels, airways, and other similar fibrous tissues. Unfortunately, most of these filters are plagued by an implicit single structure direction assumption, which results in a loss of signal around bifurcations. This peculiarity limits the performance of all subsequent processes, such as understanding angiography

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
acquisitions, computing an accurate segmentation or tractography, or automatically classifying image voxels. This paper presents a new 3-D curvilinear structure detection filter based on the analysis of the structure ball, a geometric construction representing second order differences sampled in many directions. The structure ball is defined formally, and its computation on a discreet image is discussed. A contrast invariant diffusion index easing voxel analysis and visualization is also introduced, and different structure ball shape descriptors are proposed. A new curvilinear structure detection filter is defined based on the shape descriptors that best characterize curvilinear structures. The new filter produces a vesselness measure that is robust to the presence of X- and Y-junctions along the structure by going beyond the single direction assumption. At the same time, it stays conceptually simple and deterministic, and allows for an intuitive representation of the structure's principal directions. Sample results are provided for synthetic images and for two medical imaging modalities. ETPL Image Fusion With Guided Filtering DIP-241 Abstract: A fast and effective image fusion method is proposed for creating a highly informative fused image through merging multiple images. The proposed method is based on a two-scale decomposition of an image into a base layer containing large scale variations in intensity, and a detail layer capturing small scale details. A novel guided filtering-based weighted average technique is proposed to make full use of spatial consistency for fusion of the base and detail layers. Experimental results demonstrate that the proposed method can obtain state-of-the-art performance for fusion of multispectral, multifocus, multimodal, and multiexposure images. ETPL Global Propagation of Affine Invariant Features for Robust Matching DIP-242 Abstract: Local invariant features have been successfully used in image matching to cope with viewpoint change, partial occlusion, and clutters. However, when these factors become too strong, there will be a lot of mismatches due to the limited repeatability and discriminative power of features. In this paper, we present an efficient approach to remove the false matches and propagate the correct ones for the affine invariant features which represent the state-of-the-art local invariance. First, a pair-wise affine consistency measure is proposed to evaluate the consensus of the matches of affine invariant regions. The measure takes into account both the keypoint location and the region shape, size, and orientation. Based on this measure, a geometric filter is then presented which can efficiently remove the outliers from the initial matches, and is robust to severe clutters and non-rigid deformation. To increase the correct matches, we propose a global match refinement and propagation method that simultaneously finds a optimal group of local affine transforms to relate the features in two images. The global method is capable of producing a quasi-dense set of matches even for the weakly textured surfaces that suffer strong rigid transformation or non-rigid deformation. The strong capability of the proposed method in dealing with significant viewpoint change, non-rigid deformation, and low-texture objects is demonstrated in experiments of image matching, object recognition, and image based rendering. ETPL Edge-SIFT: Discriminative Binary Descriptor for Scalable Partial-Duplicate Mobile DIP-243 Search Abstract: As the basis of large-scale partial duplicate visual search on mobile devices, image local descriptor is expected to be discriminative, efficient, and compact. Our study shows that the popularly used histogram-based descriptors, such as scale invariant feature transform (SIFT) are not optimal for this task. This is mainly because histogram representation is relatively expensive to compute on mobile platforms and loses significant spatial clues, which are important for improving discriminative power and matching near-duplicate image patches. To address these issues, we propose to extract a novel binary local descriptor named Edge-SIFT from the binary edge maps of scale- and orientation-normalized image patches. By preserving both locations and orientations of edges and compressing the sparse binary edge

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
maps with a boosting strategy, the final Edge-SIFT shows strong discriminative power with compact representation. Furthermore, we propose a fast similarity measurement and an indexing framework with flexible online verification. Hence, the Edge-SIFT allows an accurate and efficient image search and is ideal for computation sensitive scenarios such as a mobile image search. Experiments on a large-scale dataset manifest that the Edge-SIFT shows superior retrieval accuracy to Oriented BRIEF (ORB) and is superior to SIFT in the aspects of retrieval precision, efficiency, compactness, and transmission cost. ETPL Parametric Generalized Linear System Based on the Notion of the T-Norm DIP-244 Abstract: By using the triangular norm, we propose two methods for the construction of generalized linear systems, and show new insights into the relationship between typical systems. Using the Hamacher and Frank t-norm, we propose a parametric log-ratio model, which is a generalization of the log-ratio model and is more flexible in algorithmic development. We develop a generalized linear contrast enhancement algorithm based on the proposed parametric log-ratio model. We show that the performance of the proposed algorithm is effective and robust for different types of images. ETPL A Linear Support Higher-Order Tensor Machine for Classification DIP-245 Abstract: There has been growing interest in developing more effective learning machines for tensor classification. At present, most of the existing learning machines, such as support tensor machine (STM), involve nonconvex optimization problems and need to resort to iterative techniques. Obviously, it is very time-consuming and may suffer from local minima. In order to overcome these two shortcomings, in this paper, we present a novel linear support higher-order tensor machine (SHTM) which integrates the merits of linear C-support vector machine (C-SVM) and tensor rank-one decomposition. Theoretically, SHTM is an extension of the linear C-SVM to tensor patterns. When the input patterns are vectors, SHTM degenerates into the standard C-SVM. A set of experiments is conducted on nine second-order face recognition datasets and three third-order gait recognition datasets to illustrate the performance of the proposed SHTM. The statistic test shows that compared with STM and C-SVM with the RBF kernel, SHTM provides significant performance gain in terms of test accuracy and training speed, especially in the case of higher-order tensors ETPL Novel True-Motion Estimation Algorithm and Its Application to Motion-Compensated DIP-246 Temporal Frame Interpolation Abstract: In this paper, a new low-complexity true-motion estimation (TME) algorithm is proposed for video processing applications, such as motion-compensated temporal frame interpolation (MCTFI) or motion-compensated frame rate up-conversion (MCFRUC). Regular motion estimation, which is often used in video coding, aims to find the motion vectors (MVs) to reduce the temporal redundancy, whereas TME aims to track the projected object motion as closely as possible. TME is obtained by imposing implicit and/or explicit smoothness constraints on the block-matching algorithm. To produce better quality-interpolated frames, the dense motion field at interpolation time is obtained for both forward and backward MVs; then, bidirectional motion compensation using forward and backward MVs is applied by mixing both elegantly. Finally, the performance of the proposed algorithm for MCTFI is demonstrated against recently proposed methods and smoothness constraint optical flow employed by a professional video production suite. Experimental results show that the quality of the interpolated frames using the proposed method is better when compared with the MCFRUC techniques. ETPL Motion Analysis Using 3D High-Resolution Frequency Analysis DIP-247 Abstract: The spatiotemporal spectra of a video that contains a moving object form a plane in the 3D frequency domain. This plane, which is described as the theoretical motion plane, reflects the velocity of the moving objects, which is calculated from the slope. However, if the resolution of the frequency

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
analysis method is not high enough to obtain actual spectra from the object signal, the spatiotemporal spectra disperse away from the theoretical motion plane. In this paper, we propose a high-resolution frequency analysis method, described as 3D nonharmonic analysis (NHA), which is only weakly influenced by the analysis window. In addition, we estimate the motion vectors of objects in a video using the plane-clustering method, in conjunction with the least-squares method, for 3D NHA spatiotemporal spectra. We experimentally verify the accuracy of the 3D NHA and its usefulness for a sequence containing complex motions, such as cross-over motion, through comparison with 3D fast Fourier transform. The experimental results show that increasing the frequency resolution contributes to highaccuracy estimation of a motion plane. ETPL Segment Adaptive Gradient Angle Interpolation DIP-248 Abstract: We introduce a new edge-directed interpolator based on locally defined, straight line approximations of image isophotes. Spatial derivatives of image intensity are used to describe the principal behavior of pixel-intersecting isophotes in terms of their slopes. The slopes are determined by inverting a tridiagonal matrix and are forced to vary linearly from pixel-to-pixel within segments. Image resizing is performed by interpolating along the approximated isophotes. The proposed method can accommodate arbitrary scaling factors, provides state-of-the-art results in terms of PSNR as well as other quantitative visual quality metrics, and has the advantage of reduced computational complexity that is directly proportional to the number of pixels. ETPL Fast Computation of Rotation-Invariant Image Features by an Approximate Radial DIP-249 Gradient Transform Abstract: We present the radial gradient transform (RGT) and a fast approximation, the approximate RGT (ARGT). We analyze the effects of the approximation on gradient quantization and histogramming. The ARGT is incorporated into the rotation-invariant fast feature (RIFF) algorithm. We demonstrate that, using the ARGT, RIFF extracts features 16× faster than SURF while achieving a similar performance for image matching and retrieval.

ETPL Image Completion by Diffusion Maps and Spectral Relaxation DIP-250 Abstract: We present a framework for image inpainting that utilizes the diffusion framework approach to spectral dimensionality reduction. We show that on formulating the inpainting problem in the embedding domain, the domain to be inpainted is smoother in general, particularly for the textured images. Thus, the textured images can be inpainted through simple exemplar-based and variational methods. We discuss the properties of the induced smoothness and relate it to the underlying assumptions used in contemporary inpainting schemes. As the diffusion embedding is nonlinear and noninvertible, we propose a novel computational approach to approximate the inverse mapping from the inpainted embedding space to the image domain. We formulate the mapping as a discrete optimization problem, solved through spectral relaxation. The effectiveness of the presented method is exemplified by inpainting real images, where it is shown to compare favorably with contemporary state-of-the-art schemes. ETPL A Continuous Method for Reducing Interpolation Artifacts in Mutual InformationDIP-251 Based Rigid Image Registration Abstract: We propose an approach for computing mutual information in rigid multimodality image registration. Images to be registered are modeled as functions defined on a continuous image domain. Analytic forms of the probability density functions for the images and the joint probability density function are first defined in 1D. We describe how the entropies of the images, the joint entropy, and

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
mutual information can be computed accurately by a numerical method. We then extend the method to 2D and 3D. The mutual information function generated is smooth and does not seem to have the typical interpolation artifacts that are commonly observed in other standard models. The relationship between the proposed method and the partial volume (PV) model is described. In addition, we give a theoretical analysis to explain the nonsmoothness of the mutual information function computed by the PV model. Numerical experiments in 2D and 3D are presented to illustrate the smoothness of the mutual information function, which leads to robust and accurate numerical convergence results for solving the image registration problem ETPL Image Inpainting on the Basis of Spectral Structure From 2-D Nonharmonic Analysis DIP-252 Abstract: The restoration of images by digital inpainting is an active field of research and such algorithms are, in fact, now widely used. Conventional methods generally apply textures that are most similar to the areas around the missing region or use a large image database. However, this produces discontinuous textures and thus unsatisfactory results. Here, we propose a new technique to overcome this limitation by using signal prediction based on the nonharmonic analysis (NHA) technique proposed by the authors. NHA can be used to extract accurate spectra, irrespective of the window function, and its frequency resolution is less than that of the discrete Fourier transform. The proposed method sequentially generates new textures on the basis of the spectrum obtained by NHA. Missing regions from the spectrum are repaired using an improved cost function for 2D NHA. The proposed method is evaluated using the standard images Lena, Barbara, Airplane, Pepper, and Mandrill. The results show an improvement in MSE of about 10-20 compared with the examplar-based method and good subjective quality ETPL Linear Discriminant Analysis Based on L1-Norm Maximization DIP-253 Abstract: Linear discriminant analysis (LDA) is a well-known dimensionality reduction technique, which is widely used for many purposes. However, conventional LDA is sensitive to outliers because its objective function is based on the distance criterion using L2-norm. This paper proposes a simple but effective robust LDA version based on L1-norm maximization, which learns a set of local optimal projection vectors by maximizing the ratio of the L1-norm-based between-class dispersion and the L1norm-based within-class dispersion. The proposed method is theoretically proved to be feasible and robust to outliers while overcoming the singular problem of the within-class scatter matrix for conventional LDA. Experiments on artificial datasets, standard classification datasets and three popular image databases demonstrate the efficacy of the proposed method. ETPL Visual Tracking With Spatio-Temporal DempsterShafer Information Fusion DIP-254 Abstract: A key problem in visual tracking is how to effectively combine spatio-temporal visual information from throughout a video to accurately estimate the state of an object. We address this problem by incorporating Dempster-Shafer (DS) information fusion into the tracking approach. To implement this fusion task, the entire image sequence is partitioned into spatially and temporally adjacent subsequences. A support vector machine (SVM) classifier is trained for object/nonobject classification on each of these subsequences, the outputs of which act as separate data sources. To combine the discriminative information from these classifiers, we further present a spatio-temporal weighted DS (STWDS) scheme. In addition, temporally adjacent sources are likely to share discriminative information on object/nonobject classification. To use such information, an adaptive SVM learning scheme is designed to transfer discriminative information across sources. Finally, the corresponding DS belief function of the STWDS scheme is embedded into a Bayesian tracking model. Experimental results on challenging videos demonstrate the effectiveness and robustness of the proposed tracking approach. ETPL Dimensionality Reduction for Registration of High-Dimensional Data Sets

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
DIP-255 Abstract: Registration of two high-dimensional data sets often involves dimensionality reduction to yield a single-band image from each data set followed by pairwise image registration. We develop a new application-specific algorithm for dimensionality reduction of high-dimensional data sets such that the weighted harmonic mea n of Cramer -Rao lower bounds for the estimation of the transformation parameters for registration is minimized. The performance of the proposed dimensionality reduction algorithm is evaluated using three remotes sensing data sets. The experimental results using mutual information-based pairwise registration technique demonstrate that our proposed dimensionality reduction algorithm combines the original data sets to obtain the image pair with more texture, resulting in improved image registration. ETPL Multiple-Kernel, Multiple-Instance Similarity Features for Efficient Visual Object DIP-256 Detection Abstract: We propose to use the similarity between the sample instance and a number of exemplars as features in visual object detection. Concepts from multiple-kernel learning and multiple-instance learning are incorporated into our scheme at the feature level by properly calculating the similarity. The similarity between two instances can be measured by various metrics and by using the information from various sources, which mimics the use of multiple kernels for kernel machines. Pooling of the similarity values from multiple instances of an object part is introduced to cope with alignment inaccuracy between object instances. To deal with the high dimensionality of the multiple-kernel multiple-instance similarity feature, we propose a forward feature-selection technique and a coarse-to-fine learning scheme to find a set of good exemplars, hence we can produce an efficient classifier while maintaining a good performance. Both the feature and the learning technique have interesting properties. We demonstrate the performance of our method using both synthetic data and real-world visual object detection data sets. ETPL Asymmetric Correlation: A Noise Robust Similarity Measure for Template Matching DIP-257 Abstract: We present an efficient and noise robust template matching method based on asymmetric correlation (ASC). The ASC similarity function is invariant to affine illumination changes and robust to extreme noise. It correlates the given non-normalized template with a normalized version of each image window in the frequency domain. We show that this asymmetric normalization is more robust to noise than other cross correlation variants, such as the correlation coefficient. Direct computation of ASC is very slow, as a DFT needs to be calculated for each image window independently. To make the template matching efficient, we develop a much faster algorithm, which carries out a prediction step in linear time and then computes DFTs for only a few promising candidate windows. We extend the proposed template matching scheme to deal with partial occlusion and spatially varying light change. Experimental results demonstrate the robustness of the proposed ASC similarity measure compared to state-of-the-art template matching methods. ETPL Deconvolving Images With Unknown Boundaries Using the Alternating Direction DIP-258 Method of Multipliers Abstract: The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for inverse problems, namely, image deconvolution and reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable sub-problems (e.g., using fast Fourier or wavelet transforms, or simple proximity operators). In deconvolution, one of these sub-problems involves a matrix inversion (i.e., solving a linear system), which can be done efficiently (in the discrete Fourier domain) if the observation operator is circulant, i.e., under periodic boundary conditions. This paper extends ADMM-based image deconvolution to the more realistic scenario of unknown boundary, where the observation operator is modeled as the composition of

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
a convolution (with arbitrary boundary conditions) with a spatial mask that keeps only pixels that do not depend on the unknown boundary. The proposed approach also handles, at no extra cost, problems that combine the recovery of missing pixels (i.e., inpainting) with deconvolution. We show that the resulting algorithms inherit the convergence guarantees of ADMM and illustrate its performance on non-periodic deblurring (with and without inpainting of interior pixels) under total-variation and frame-based regularization. ETPL Integration of Gibbs Markov Random Field and Hopfield-Type Neural Networks for DIP-259 Unsupervised Change Detection in Remotely Sensed Multitemporal Images Abstract: In this paper, a spatiocontextual unsupervised change detection technique for multitemporal, multispectral remote sensing images is proposed. The technique uses a Gibbs Markov random field (GMRF) to model the spatial regularity between the neighboring pixels of the multitemporal difference image. The difference image is generated by change vector analysis applied to images acquired on the same geographical area at different times. The change detection problem is solved using the maximum a posteriori probability (MAP) estimation principle. The MAP estimator of the GMRF used to model the difference image is exponential in nature, thus a modified Hopfield type neural network (HTNN) is exploited for estimating the MAP. In the considered Hopfield type network, a single neuron is assigned to each pixel of the difference image and is assumed to be connected only to its neighbors. Initial values of the neurons are set by histogram thresholding. An expectation-maximization algorithm is used to estimate the GMRF model parameters. Experiments are carried out on three-multispectral and multitemporal remote sensing images. Results of the proposed change detection scheme are compared with those of the manual-trial-and-error technique, automatic change detection scheme based on GMRF model and iterated conditional mode algorithm, a context sensitive change detection scheme based on HTNN, the GMRF model, and a graph-cut algorithm. A comparison points out that the proposed method provides more accurate change detection maps than other methods.

ETPL SparCLeS: Dynamic Sparse Classifiers With Level Sets for Robust DIP-260 Beard/Moustache Detection and Segmentation Abstract: Robust facial hair detection and segmentation is a highly valued soft biometric attribute for carrying out forensic facial analysis. In this paper, we propose a novel and fully automatic system, called SparCLeS, for beard/moustache detection and segmentation in challenging facial images. SparCLeS uses the multiscale self-quotient (MSQ) algorithm to preprocess facial images and deal with illumination variation. Histogram of oriented gradients (HOG) features are extracted from the preprocessed images and a dynamic sparse classifier is built using these features to classify a facial region as either containing skin or facial hair. A level set based approach, which makes use of the advantages of both global and local information, is then used to segment the regions of a face containing facial hair. Experimental results demonstrate the effectiveness of our proposed system in detecting and segmenting facial hair regions in images drawn from three databases, i.e., the NIST Multiple Biometric Grand Challenge (MBGC) still face database, the NIST Color Facial Recognition Technology FERET database, and the Labeled Faces in the Wild (LFW) database. ETPL Cross-Domain Object Recognition Via Input-Output Kernel Analysis DIP-261 Abstract: It is of great importance to investigate the domain adaptation problem of image object recognition, because now image data is available from a variety of source domains. To understand the changes in data distributions across domains, we study both the input and output kernel spaces for crossdomain learning situations, where most labeled training images are from a source domain and testing images are from a different target domain. To address the feature distribution change issue in the

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
reproducing kernel Hilbert space induced by vector-valued functions, we propose a domain adaptive input-output kernel learning (DA-IOKL) algorithm, which simultaneously learns both the input and output kernels with a discriminative vector-valued decision function by reducing the data mismatch and minimizing the structural error. We also extend the proposed method to the cases of having multiple source domains. We examine two cross-domain object recognition benchmark data sets, and the proposed method consistently outperforms the state-of-the-art domain adaptation and multiple kernel learning methods. ETPL Regularized Feature Reconstruction for Spatio-Temporal Saliency Detection DIP-262 Abstract: Multimedia applications such as image or video retrieval, copy detection, and so forth can benefit from saliency detection, which is essentially a method to identify areas in images and videos that capture the attention of the human visual system. In this paper, we propose a new spatio-temporal saliency detection framework on the basis of regularized feature reconstruction. Specifically, for video saliency detection, both the temporal and spatial saliency detection are considered. For temporal saliency, we model the movement of the target patch as a reconstruction process using the patches in neighboring frames. A Laplacian smoothing term is introduced to model the coherent motion trajectories. With psychological findings that abrupt stimulus could cause a rapid and involuntary deployment of attention, our temporal model combines the reconstruction error, regularizer, and local trajectory contrast to measure the temporal saliency. For spatial saliency, a similar sparse reconstruction process is adopted to capture the regions with high center-surround contrast. Finally, the temporal saliency and spatial saliency are combined together to favor salient regions with high confidence for video saliency detection. We also apply the spatial saliency part of the spatio-temporal model to image saliency detection. Experimental results on a human fixation video dataset and an image saliency detection dataset show that our method achieves the best performance over several state-of-the-art approaches. ETPL Texture Enhanced Histogram Equalization Using TVImage Decomposition DIP-263 Abstract: Histogram transformation defines a class of image processing operations that are widely applied in the implementation of data normalization algorithms. In this paper, we present a new variational approach for image enhancement that is constructed to alleviate the intensity saturation effects that are introduced by standard contrast enhancement (CE) methods based on histogram equalization. In this paper, we initially apply total variation (TV) minimization with a L 1 fidelity term to decompose the input image with respect to cartoon and texture components. Contrary to previous papers that rely solely on the information encompassed in the distribution of the intensity information, in this paper, the texture information is also employed to emphasize the contribution of the local textural features in the CE process. This is achieved by implementing a nonlinear histogram warping CE strategy that is able to maximize the information content in the transformed image. Our experimental study addresses the CE of a wide variety of image data and comparative evaluations are provided to illustrate that our method produces better results than conventional CE strategies. ETPL Gaussian Blurring-Invariant Comparison of Signals and Images DIP-264 Abstract: We present a Riemannian framework for analyzing signals and images in a manner that is invariant to their level of blurriness, under Gaussian blurring. Using a well known relation between Gaussian blurring and the heat equation, we establish an action of the blurring group on image space and define an orthogonal section of this action to represent and compare images at the same blur level. This comparison is based on geodesic distances on the section manifold which, in turn, are computed using a path-straightening algorithm. The actual implementations use coefficients of images under a truncated orthonormal basis and the blurring action corresponds to exponential decays of these coefficients. We

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
demonstrate this framework using a number of experimental results, involving 1D signals and 2D images. As a specific application, we study the effect of blurring on the recognition performance when 2D facial images are used for recognizing people. ETPL Fast SIFT Design for Real-Time Visual Feature Extraction DIP-265 Abstract: Visual feature extraction with scale invariant feature transform (SIFT) is widely used for object recognition. However, its real-time implementation suffers from long latency, heavy computation, and high memory storage because of its frame level computation with iterated Gaussian blur operations. Thus, this paper proposes a layer parallel SIFT (LPSIFT) with integral image, and its parallel hardware design with an on-the-fly feature extraction flow for real-time application needs. Compared with the original SIFT algorithm, the proposed approach reduces the computational amount by 90% and memory usage by 95%. The final implementation uses 580-K gate count with 90-nm CMOS technology, and offers 6000 feature points/frame for VGA images at 30 frames/s and ~ 2000 feature points/frame for 1920 1080 images at 30 frames/s at the clock rate of 100 MHz. ETPL Artistic Image Analysis Using Graph-Based Learning Approaches DIP-266 Abstract: We introduce a new methodology for the problem of artistic image analysis, which among other tasks, involves the automatic identification of visual classes present in an art work. In this paper, we advocate the idea that artistic image analysis must explore a graph that captures the network of artistic influences by computing the similarities in terms of appearance and manual annotation. One of the novelties of our methodology is the proposed formulation that is a principled way of combining these two similarities in a single graph. Using this graph, we show that an efficient random walk algorithm based on an inverted label propagation formulation produces more accurate annotation and retrieval results compared with the following baseline algorithms: bag of visual words, label propagation, matrix completion, and structural learning. We also show that the proposed approach leads to a more efficient inference and training procedures. This experiment is run on a database containing 988 artistic images (with 49 visual classification problems divided into a multiclass problem with 27 classes and 48 binary problems), where we show the inference and training running times, and quantitative comparisons with respect to several retrieval and annotation performance measures. ETPL Self-Supervised Online Metric Learning With Low Rank Constraint for Scene DIP-267 Categorization Abstract: Conventional visual recognition systems usually train an image classifier in a bath mode with all training data provided in advance. However, in many practical applications, only a small amount of training samples are available in the beginning and many more would come sequentially during online recognition. Because the image data characteristics could change over time, it is important for the classifier to adapt to the new data incrementally. In this paper, we present an online metric learning method to address the online scene recognition problem via adaptive similarity measurement. Given a number of labeled data followed by a sequential input of unseen testing samples, the similarity metric is learned to maximize the margin of the distance among different classes of samples. By considering the low rank constraint, our online metric learning model not only can provide competitive performance compared with the state-of-the-art methods, but also guarantees convergence. A bi-linear graph is also defined to model the pair-wise similarity, and an unseen sample is labeled depending on the graph-based label propagation, while the model can also self-update using the more confident new samples. With the ability of online learning, our methodology can well handle the large-scale streaming video data with the ability of incremental self-updating. We evaluate our model to online scene categorization and experiments on various benchmark datasets and comparisons with state-of-the-art methods demonstrate the effectiveness and efficiency of our algorithm.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
ETPL Nonlocal Regularization of Inverse Problems: A Unified Variational Framework DIP-268 Abstract: We introduce a unifying energy minimization framework for nonlocal regularization of inverse problems. In contrast to the weighted sum of square differences between image pixels used by current schemes, the proposed functional is an unweighted sum of inter-patch distances. We use robust distance metrics that promote the averaging of similar patches, while discouraging the averaging of dissimilar patches. We show that the first iteration of a majorize-minimize algorithm to minimize the proposed cost function is similar to current nonlocal methods. The reformulation thus provides a theoretical justification for the heuristic approach of iterating nonlocal schemes, which re-estimate the weights from the current image estimate. Thanks to the reformulation, we now understand that the widely reported alias amplification associated with iterative nonlocal methods are caused by the convergence to local minimum of the nonconvex penalty. We introduce an efficient continuation strategy to overcome this problem. The similarity of the proposed criterion to widely used nonquadratic penalties (e.g., total variation and lp semi-norms) opens the door to the adaptation of fast algorithms developed in the context of compressive sensing; we introduce several novel algorithms to solve the proposed nonlocal optimization problem. Thanks to the unifying framework, these fast algorithms are readily applicable for a large class of distance metrics. ETPL Corner Detection and Classification Using Anisotropic Directional Derivative DIP-269 Representations Abstract: This paper proposes a corner detector and classifier using anisotropic directional derivative (ANDD) representations. The ANDD representation at a pixel is a function of the oriented angle and characterizes the local directional grayscale variation around the pixel. The proposed corner detector fuses the ideas of the contour- and intensity-based detection. It consists of three cascaded blocks. First, the edge map of an image is obtained by the Canny detector and from which contours are extracted and patched. Next, the ANDD representation at each pixel on contours is calculated and normalized by its maximal magnitude. The area surrounded by the normalized ANDD representation forms a new corner measure. Finally, the nonmaximum suppression and thresholding are operated on each contour to find corners in terms of the corner measure. Moreover, a corner classifier based on the peak number of the ANDD representation is given. Experiments are made to evaluate the proposed detector and classifier. The proposed detector is competitive with the two recent state-of-the-art corner detectors, the He & Yung detector and CPDA detector, in detection capability and attains higher repeatability under affine transforms. The proposed classifier can discriminate effectively simple corners, Y-type corners, and higher order corners

ETPL Classification of Time Series of Multispectral Images With Limited Training Data DIP-270 Abstract: Image classification usually requires the availability of reliable reference data collected for the considered image to train supervised classifiers. Unfortunately when time series of images are considered, this is seldom possible because of the costs associated with reference data collection. In most of the applications it is realistic to have reference data available for one or few images of a time series acquired on the area of interest. In this paper, we present a novel system for automatically classifying image time series that takes advantage of image(s) with an associated reference information (i.e., the source domain) to classify image(s) for which reference information is not available (i.e., the target domain). The proposed system exploits the already available knowledge on the source domain and, when possible, integrates it with a minimum amount of new labeled data for the target domain. In addition, it is able to handle possible significant differences between statistical distributions of the source and target domains.

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
Here, the method is presented in the context of classification of remote sensing image time series, where ground reference data collection is a highly critical and demanding task. Experimental results show the effectiveness of the proposed technique. The method can work on multimodal (e.g., multispectral) images ETPL Fast -Minimization Algorithms for Robust Face Recognition DIP-271 Abstract: l 1-minimization refers to finding the minimum l1-norm solution to an underdetermined linear system mbib=Ambix. Under certain conditions as described in compressive sensing theory, the minimum l1-norm solution is also the sparsest solution. In this paper, we study the speed and scalability of its algorithms. In particular, we focus on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation. Although the underlying numerical problem is a linear program, traditional algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution based on a classical convex optimization framework, known as augmented Lagrangian methods. We conduct extensive experiments to validate and compare its performance against several popular l1minimization solvers, including interior-point method, Homotopy, FISTA, SESOP-PCD, approximate message passing, and TFOCS. To aid peer evaluation, the code for all the algorithms has been made publicly available. ETPL Robust Face Representation Using Hybrid Spatial Feature Interdependence Matrix DIP-272 Abstract: A key issue in face recognition is to seek an effective descriptor for representing face appearance. In the context of considering the face image as a set of small facial regions, this paper presents a new face representation approach coined spatial feature interdependence matrix (SFIM). Unlike classical face descriptors which usually use a hierarchically organized or a sequentially concatenated structure to describe the spatial layout features extracted from local regions, SFIM is attributed to the exploitation of the underlying feature interdependences regarding local region pairs inside a class specific face. According to SFIM, the face image is projected onto an undirected connected graph in a manner that explicitly encodes feature interdependence-based relationships between local regions. We calculate the pair-wise interdependence strength as the weighted discrepancy between two feature sets extracted in a hybrid feature space fusing histograms of intensity, local binary pattern and oriented gradients. To achieve the goal of face recognition, our SFIM-based face descriptor is embedded in three different recognition frameworks, namely nearest neighbor search, subspace-based classification, and linear optimization-based classification. Extensive experimental results on four well-known face databases and comprehensive comparisons with the state-of-the-art results are provided to demonstrate the efficacy of the proposed SFIM-based descriptor ETPL Motion Estimation Using the Correlation Transform DIP-273 Abstract: The zero-mean normalized cross-correlation is shown to improve the accuracy of optical flow, but its analytical form is quite complicated for the variational framework. This paper addresses this issue and presents a new direct approach to this matching measure. Our approach uses the correlation transform to define very discriminative descriptors that are pre-computed and that have to be matched in the target frame. It is equivalent to the computation of the optical flow for the correlation transforms of the images. The smoothness energy is non-local and uses a robust penalty in order to preserve motion discontinuities. The model is associated with a fast and parallelizable minimization procedure based on the projectedproximal point algorithm. The experiments confirm the strength of this model and implicitly demonstrate the correctness of our solution. The results demonstrate that the involved data term is very robust with

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
respect to changes in illumination, especially where large illumination exists. ETPL Single Image Dehazing by Multi-Scale Fusion DIP-274 Abstract: Haze is an atmospheric phenomenon that significantly degrades the visibility of outdoor scenes. This is mainly due to the atmosphere particles that absorb and scatter the light. This paper introduces a novel single image approach that enhances the visibility of such degraded images. Our method is a fusion-based strategy that derives from two original hazy image inputs by applying a white balance and a contrast enhancing procedure. To blend effectively the information of the derived inputs to preserve the regions with good visibility, we filter their important features by computing three measures (weight maps): luminance, chromaticity, and saliency. To minimize artifacts introduced by the weight maps, our approach is designed in a multiscale fashion, using a Laplacian pyramid representation. We are the first to demonstrate the utility and effectiveness of a fusion-based technique for dehazing based on a single degraded image. The method performs in a per-pixel fashion, which is straightforward to implement. The experimental results demonstrate that the method yields results comparative to and even better than the more complex state-of-the-art techniques, having the advantage of being appropriate for real-time applications. ETPL Joint Sparse Learning for 3-D Facial Expression Generation DIP-275 Abstract: 3-D facial expression generation, including synthesis and retargeting, has received intensive attentions in recent years, because it is important to produce realistic 3-D faces with specific expressions in modern film production and computer games. In this paper, we present joint sparse learning (JSL) to learn mapping functions and their respective inverses to model the relationship between the highdimensional 3-D faces (of different expressions and identities) and their corresponding low-dimensional representations. Based on JSL, we can effectively and efficiently generate various expressions of a 3-D face by either synthesizing or retargeting. Furthermore, JSL is able to restore 3-D faces with holes by learning a mapping function between incomplete and intact data. Experimental results on a wide range of 3-D faces demonstrate the effectiveness of the proposed approach by comparing with representative ones in terms of quality, time cost, and robustness. ETPL Robust Model for Segmenting Images With/Without Intensity Inhomogeneities DIP-276 Abstract: Intensity inhomogeneities and different types/levels of image noise are the two major obstacles to accurate image segmentation by region-based level set models. To provide a more general solution to these challenges, we propose a novel segmentation model that considers global and local image statistics to eliminate the influence of image noise and to compensate for intensity inhomogeneities. In our model, the global energy derived from a Gaussian model estimates the intensity distribution of the target object and background; the local energy derived from the mutual influences of neighboring pixels can eliminate the impact of image noise and intensity inhomogeneities. The robustness of our method is validated on segmenting synthetic images with/without intensity inhomogeneities, and with different types/levels of noise, including Gaussian noise, speckle noise, and salt and pepper noise, as well as images from different medical imaging modalities. Quantitative experimental comparisons demonstrate that our method is more robust and more accurate in segmenting the images with intensity inhomogeneities than the local binary fitting technique and its more recent systematic model. Our technique also outperformed the region-based Chan-Vese model when dealing with images without intensity inhomogeneities and produce better segmentation results than the graph-based algorithms including graph-cuts and random walker when segmenting noisy images. ETPL Learning Prototype Hyperplanes for Face Verification in the Wild

Elysium Technologies Private Limited


Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad | Pondicherry | Trivandrum | Salem | Erode | Tirunelveli http://www.elysiumtechnologies.com, info@elysiumtechnologies.com
DIP-277 Abstract: In this paper, we propose a new scheme called Prototype Hyperplane Learning (PHL) for face verification in the wild using only weakly labeled training samples (i.e., we only know whether each pair of samples are from the same class or different classes without knowing the class label of each sample) by leveraging a large number of unlabeled samples in a generic data set. Our scheme represents each sample in the weakly labeled data set as a mid-level feature with each entry as the corresponding decision value from the classification hyperplane (referred to as the prototype hyperplane) of one Support Vector Machine (SVM) model, in which a sparse set of support vectors is selected from the unlabeled generic data set based on the learnt combination coefficients. To learn the optimal prototype hyperplanes for the extraction of mid-level features, we propose a Fisher's Linear Discriminant-like (FLD-like) objective function by maximizing the discriminability on the weakly labeled data set with a constraint enforcing sparsity on the combination coefficients of each SVM model, which is solved by using an alternating optimization method. Then, we use the recent work called Side-Information based Linear Discriminant (SILD) analysis for dimensionality reduction and a cosine similarity measure for final face verification. Comprehensive experiments on two data sets, Labeled Faces in the Wild (LFW) and YouTube Faces, demonstrate the effectiveness of our scheme.

You might also like