INDEX
Explanations
the word "Variation" or its variants, indicating a focus on diversity or differences
New Auto-Interp
Negative Logits
ilt
-0.20
ining
-0.17
ilit
-0.17
ra
-0.16
RACT
-0.16
med
-0.15
ematik
-0.15
ri
-0.15
د
-0.15
mes
-0.15
POSITIVE LOGITS
i
0.25
ETY
0.22
eties
0.21
uos
0.21
ety
0.21
etÃł
0.20
ums
0.20
été
0.19
juan
0.19
amente
0.18
Activations Density 0.045%