You are on page 1of 2

GENFAM ALGORITHM DETAILS DEC 7

Hashim continued from last meeting and completed his presentation on various properties used in evaluating gene clusters for synteny. Hashim presented his corrected formula, which was a correction from previous meeting. The group discussed some issues regarding the new algorithm and Lars pointed out certain alternatives to this algorithm. Here are the details of the four algorithms discussed in the meeting. (1) Max-hit only method This was our rst and initial approach to nding SNC (Syntenic Neighborhood Correlation) scores. The idea is to nd one neighbor of a gene g and neighbor of its NC-hit (Neighborhood Correlation hit) nc , which has the maximum NCscore. Weigh and use this max score as the SNC score. for each gene g in Dataset for each NChit nc in g SNC(g , nc ) = [argmax{N C(a, b) w1 (a, b) : a n(g)&&b n(nc)}] w2 (g, nc) (2) Sum up every hit method The idea is to compute weighted synteny scores by adding up any weighted NChit score that we can nd between all the neighbors of gene g and its NC-hit nc. for each gene g in Dataset for each NChit nc in g SNC(g , nc ) = [

{N C(a, b) w1 (a, b)}] w2 (g, nc).

an(g) bn(nc)

(3) Sum one-way best hit method The idea is to nd the best weighted NC-hit scores for neighbors of g in neighbors of nc and sum them up to compute SNC score. for each gene g in Dataset for each NChit nc in g

h = argmaxbn(nc) {N C(a, b) w1 (a, b) : a n(g)}

SNC(g , nc ) = [

{N C(a, h) w1 (a, h)}] w2 (g, nc).


1

an(g)

GENFAM ALGORITHM DETAILS DEC 7

(4) Sum both one-way best hit method The idea is to compute (3) and then add the sum of best weighted NC-hit scores for neighbors of nc in neighbors of g. for each gene g in Dataset for each NChit nc in g h = argmaxbn(nc) {N C(a, b) w1 (a, b) : a n(g)}

k = argmaxan(g) {N C(a, b) w1 (a, b) : b n(nc)} SNC(g , nc ) = [ {N C(a, h)w1 (a, h)} +


an(g)

{N C(k, b)w1 (k, b)}]


bn(nc)

w2 (g, nc). Auwn gave the idea of normalizing these scores and Mehmood explained the standard technique of normalization. The idea of weighting SNC scores w.r.t properties important for clustering came under discussion. However, we could not conclude this part due to time constraints. So, one of the agenda for the next meeting would be to discuss the important gene team parameters and weighting mechanism for these algorithms denoted previously in this document by w1 (a, b) and w2 (g, nc).

You might also like