INDEX
Explanations
phrases related to strong opinions or beliefs
instances of the word "think" and its variations
New Auto-Interp
Negative Logits
yna
-0.80
EStreamFrame
-0.77
iona
-0.77
anca
-0.75
avez
-0.72
oulos
-0.72
ulia
-0.69
Redditor
-0.65
byn
-0.65
çĦ
-0.64
POSITIVE LOGITS
differently
1.08
twice
0.90
wrongly
0.85
incorrectly
0.84
lessly
0.81
otherwise
0.80
about
0.79
ingly
0.77
alike
0.75
aloud
0.75
Activations Density 0.093%