INDEX
Explanations
a followed by descriptive words
New Auto-Interp
Negative Logits
лары
1.06
Wh
1.05
-
1.04
领域的
1.03
ajer
1.02
departmental
0.99
ایت
0.98
orld
0.97
acch
0.97
azz
0.97
POSITIVE LOGITS
guter
1.80
moins
1.74
tenho
1.71
posso
1.69
joli
1.64
nessuna
1.62
inget
1.59
estou
1.59
ottimo
1.58
nincs
1.58
Activations Density 0.539%