INDEX
Explanations
phrases indicating non-standard or atypical conditions
New Auto-Interp
Negative Logits
italienne
-0.69
imprimée
-0.69
scattata
-0.61
préféré
-0.61
gjerne
-0.60
literaria
-0.60
picha
-0.60
avance
-0.59
innamor
-0.58
brainly
-0.58
POSITIVE LOGITS
non
1.18
NON
1.17
Non
1.16
Non
1.04
non
1.02
非
1.01
Nong
0.95
NON
0.94
nons
0.94
Nons
0.91
Activations Density 0.096%