INDEX
Explanations
comparisons between different items or entities
comparative phrases expressing contrast or opposition
New Auto-Interp
Negative Logits
collar
-0.81
ERN
-0.80
overed
-0.76
eur
-0.76
abet
-0.75
red
-0.75
mberg
-0.75
olog
-0.74
lene
-0.72
yright
-0.72
POSITIVE LOGITS
mindset
0.66
scissors
0.65
apples
0.65
nesday
0.64
nil
0.63
bandits
0.63
averages
0.63
cannabin
0.62
monkeys
0.62
situational
0.61
Activations Density 0.014%