INDEX
Negative Logits
idth
-0.80
yip
-0.70
ramid
-0.69
rium
-0.69
ogly
-0.63
letters
-0.59
weeney
-0.58
gow
-0.58
rongh
-0.57
attering
-0.56
POSITIVE LOGITS
satisf
1.21
peacefully
1.04
administr
1.03
surg
0.96
promptly
0.94
by
0.93
diplom
0.91
via
0.90
swiftly
0.87
manually
0.85
Activations Density 0.173%