INDEX
Explanations
personal pronouns and references to self and others
New Auto-Interp
Negative Logits
amba
-0.17
ime
-0.16
uff
-0.15
dies
-0.15
wie
-0.14
auf
-0.14
472
-0.14
åģ¥
-0.14
estroy
-0.14
weis
-0.13
POSITIVE LOGITS
Conte
0.13
_simps
0.13
ovsky
0.13
Ùħز
0.13
ibar
0.13
íĭ°
0.13
ymes
0.13
èĮĤ
0.13
å»
0.13
äºŃ
0.13
Activations Density 0.645%