INDEX
Explanations
negative statements about a situation or a claim
phrases that emphasize negation or contrast
New Auto-Interp
Negative Logits
luaj
-0.75
named
-0.69
Named
-0.60
ukong
-0.59
regate
-0.59
maiden
-0.58
ierce
-0.58
Repair
-0.56
miah
-0.56
isively
-0.55
POSITIVE LOGITS
irrelevant
0.87
nowhere
0.75
nonetheless
0.74
untrue
0.73
hardly
0.70
unlikely
0.69
unavoidable
0.69
alright
0.67
nevertheless
0.67
outwe
0.66
Activations Density 0.151%