INDEX
Explanations
verbs or phrases related to expressing strong emotions or criticisms
words related to actions of condemnation or rejection
New Auto-Interp
Negative Logits
zo
-0.68
TEXT
-0.64
.}
-0.63
til
-0.63
zon
-0.59
lyss
-0.59
WITHOUT
-0.59
entertained
-0.59
suppose
-0.58
poral
-0.58
POSITIVE LOGITS
discredit
0.84
igated
0.79
aside
0.78
blame
0.78
Patel
0.76
scorn
0.76
havoc
0.73
out
0.73
ocrats
0.70
igate
0.69
Activations Density 0.123%