INDEX
Negative Logits
constit
-0.63
agra
-0.60
quin
-0.59
swick
-0.59
arial
-0.57
aced
-0.56
center
-0.56
prompted
-0.56
altogether
-0.56
intrig
-0.55
POSITIVE LOGITS
beware
1.11
Beware
1.03
Always
1.00
Always
0.99
Avoid
0.90
Eat
0.88
Avoid
0.86
Keep
0.85
Stay
0.84
Keep
0.83
Activations Density 0.385%