INDEX
Explanations
negations or refusal expressions in sentences
New Auto-Interp
Negative Logits
CI
-0.76
è£ıè
-0.67
å½
-0.67
ItemImage
-0.66
referen
-0.63
WithNo
-0.62
velop
-0.62
grounds
-0.62
jected
-0.60
beginnings
-0.60
POSITIVE LOGITS
necessarily
1.08
lose
1.03
bother
1.00
compete
1.00
decide
0.98
seem
0.98
get
0.96
hurry
0.93
quit
0.92
gotta
0.92
Activations Density 0.040%