INDEX
Explanations
phrases indicating conditionality or dependency
New Auto-Interp
Negative Logits
_ASM
-0.17
twig
-0.16
istra
-0.15
ÎŃÏģα
-0.14
_asm
-0.14
commod
-0.14
obra
-0.14
pokus
-0.14
IMA
-0.14
zes
-0.14
POSITIVE LOGITS
nst
0.17
Latina
0.16
bÃŃ
0.16
ate
0.15
lé
0.15
ÏĢοÏĦε
0.15
bar
0.15
obar
0.14
ace
0.14
ew
0.14
Activations Density 0.014%