INDEX
Explanations
the presence of specific characters or sequences within words
New Auto-Interp
Negative Logits
ager
-0.17
ce
-0.16
lane
-0.16
lag
-0.16
ne
-0.16
anna
-0.16
loff
-0.16
itage
-0.16
soever
-0.15
enced
-0.15
POSITIVE LOGITS
upal
0.17
letic
0.17
erif
0.16
ãi
0.15
través
0.14
alet
0.14
oog
0.14
resi
0.14
partir
0.14
éro
0.14
Activations Density 0.022%