INDEX
Explanations
phrases indicating ongoing existence or duration
New Auto-Interp
Negative Logits
ansen
-0.16
sing
-0.16
<!--↵
-0.15
ickey
-0.14
imson
-0.14
ensis
-0.14
gua
-0.14
ookie
-0.14
llib
-0.14
sein
-0.14
POSITIVE LOGITS
ÅĻez
0.15
theless
0.15
ATAL
0.14
rier
0.13
nin
0.13
been
0.13
ed
0.13
orno
0.13
marg
0.13
="../../../
0.13
Activations Density 0.020%