INDEX
Explanations
prepositions with specific words
New Auto-Interp
Negative Logits
rediscovered
0.48
devenue
0.45
습니다
0.42
characters
0.41
effectivement
0.41
Pferde
0.41
personajes
0.40
کردار
0.40
égard
0.40
ходит
0.39
POSITIVE LOGITS
from
0.84
with
0.79
by
0.77
in
0.77
في
0.76
via
0.72
through
0.72
on
0.71
على
0.69
från
0.67
Activations Density 0.019%