INDEX
Explanations
Italian affirmative or referring phrases
New Auto-Interp
Negative Logits
si
-0.19
SI
-0.18
SI
-0.16
jar
-0.15
carg
-0.15
si
-0.15
rest
-0.15
Extras
-0.15
¸ı
-0.15
Extra
-0.15
POSITIVE LOGITS
amo
0.19
oux
0.17
esta
0.17
ÄįÃŃ
0.16
profil
0.16
impse
0.15
Buffers
0.15
amina
0.15
possono
0.14
agg
0.14
Activations Density 0.001%