INDEX
Explanations
words and phrases that indicate conflict or controversy
New Auto-Interp
Negative Logits
imony
-0.16
reau
-0.15
nier
-0.15
enance
-0.14
ern
-0.14
impuls
-0.14
bai
-0.14
agua
-0.14
mony
-0.14
succ
-0.14
POSITIVE LOGITS
dormant
0.28
reaction
0.23
responses
0.23
debate
0.23
response
0.23
reactions
0.22
-response
0.19
åıįåºĶ
0.19
Dorm
0.18
latent
0.18
Activations Density 0.098%