INDEX
Explanations
dates and events
punctuations and conjunctions
New Auto-Interp
Negative Logits
and
0.42
an
0.41
t
0.39
u
0.39
↵
0.36
ar
0.35
al
0.33
er
0.32
it
0.32
-
0.32
POSITIVE LOGITS
étaient
0.32
íamos
0.31
impuestos
0.30
ataques
0.30
ešte
0.30
؟
0.29
kucing
0.29
erano
0.29
jakiś
0.29
idk
0.29
Activations Density 0.000%