INDEX
Explanations
references to historical events or significant narratives involving consequences
New Auto-Interp
Negative Logits
ovaly
-0.14
obus
-0.14
mouth
-0.14
urnal
-0.14
ousse
-0.14
959
-0.13
ë¡ł
-0.13
apat
-0.13
Dawson
-0.13
initially
-0.13
POSITIVE LOGITS
success
0.19
isol
0.17
succès
0.17
æĪIJåĬŁ
0.17
ìĦ±ê³µ
0.17
isolated
0.17
úspÄĽ
0.16
success
0.16
itler
0.16
sucesso
0.15
Activations Density 0.001%