INDEX
Explanations
negative statements or sentiments
negative expressions regarding personal rights and beliefs
New Auto-Interp
Negative Logits
oided
-0.75
legged
-0.63
RTX
-0.63
Redditor
-0.63
AFP
-0.62
assorted
-0.62
Presence
-0.61
nonetheless
-0.58
Alas
-0.57
anan
-0.57
POSITIVE LOGITS
necessarily
1.21
gonna
0.96
anymore
0.96
condone
0.91
wanna
0.89
really
0.86
anybody
0.85
ĨĴ
0.82
hesitate
0.79
kidding
0.77
Activations Density 0.324%