INDEX
Explanations
negative phrases or expressions of denial
New Auto-Interp
Negative Logits
Portail
-0.69
culturelles
-0.69
}\
-0.69
IONI
-0.68
Datuak
-0.67
respectively
-0.66
{}\-0.66
Charming
-0.65
متعلقه
-0.65
jeter
-0.65
POSITIVE LOGITS
No
1.49
no
1.41
No
1.38
NO
1.31
NO
1.10
no
1.04
Noyes
1.01
sno
0.98
nof
0.97
Noor
0.95
Activations Density 0.150%