INDEX
Explanations
the presence of strong emotional or impactful language
New Auto-Interp
Negative Logits
andon
-0.15
$MESS
-0.15
ellar
-0.15
.om
-0.14
Omni
-0.14
omi
-0.14
ç¿Ķ
-0.14
Bands
-0.14
emo
-0.14
654
-0.14
POSITIVE LOGITS
orte
0.17
nat
0.15
ected
0.15
els
0.15
nat
0.15
Baths
0.15
ague
0.14
ense
0.14
Nat
0.14
Nat
0.14
Activations Density 0.027%