INDEX
Explanations
phrases that indicate alternatives or different options
the word else
New Auto-Interp
Negative Logits
boyunca
-0.42
obtenido
-0.42
leeftijd
-0.41
těch
-0.40
zejména
-0.40
jednotliv
-0.40
muertes
-0.39
rodillas
-0.38
pokud
-0.38
jäm
-0.38
POSITIVE LOGITS
another
1.11
another
1.09
Another
0.99
Another
0.94
ANOTHER
0.93
Otro
0.79
Otra
0.79
Otra
0.77
別の
0.73
otra
0.71
Activations Density 0.010%