INDEX
Explanations
phrases expressing disappointment or adversity
New Auto-Interp
Negative Logits
odd
-0.15
ussen
-0.15
rig
-0.15
ozilla
-0.14
zdy
-0.14
wonder
-0.14
ffa
-0.14
Ñĩем
-0.14
IfNeeded
-0.14
_equiv
-0.14
POSITIVE LOGITS
ids
0.17
cannot
0.16
ickle
0.15
omon
0.15
Barrier
0.15
ër
0.15
(?
0.14
none
0.14
(?
0.14
predecess
0.14
Activations Density 0.067%