INDEX
Explanations
words related to making decisions or evaluating options
expressions of subjective opinion or preference
New Auto-Interp
Negative Logits
ylum
-0.70
estern
-0.69
Synopsis
-0.69
iannopoulos
-0.68
Democratic
-0.66
ÂŃ
-0.66
-0.65
whistleblower
-0.64
ön
-0.63
billion
-0.63
POSITIVE LOGITS
thats
1.15
I
0.99
haha
0.87
anyways
0.86
it
0.86
alot
0.85
honestly
0.83
XD
0.83
;)
0.81
doesnt
0.80
Activations Density 0.520%