INDEX
Explanations
negations and expressions of impossibility or absence
Text containing "no"
no followed by a word
New Auto-Interp
Negative Logits
<<<<<<<<<<<<<<
-0.76
ValueStyle
-0.74
IsMutable
-0.69
Personensuche
-0.68
Efq
-0.68
culturelles
-0.67
AsUp
-0.66
esternos
-0.66
aceea
-0.65
قایناقلار
-0.65
POSITIVE LOGITS
such
0.70
better
0.69
reason
0.58
no
0.58
way
0.55
igno
0.55
so
0.53
denying
0.51
easier
0.51
worse
0.51
Activations Density 0.105%