INDEX
Negative Logits
Wise
0.66
dhatu
0.66
WISE
0.63
Neighbor
0.63
Transient
0.62
WISE
0.61
Neighborhood
0.59
भारत
0.59
neighborhood
0.58
Northwestern
0.58
POSITIVE LOGITS
accusations
0.77
criticism
0.68
criticise
0.68
hypocrisy
0.68
rumours
0.66
conspiracies
0.65
allegations
0.64
homosexuality
0.64
talento
0.64
sarcasm
0.64
Activations Density 0.000%