INDEX
Explanations
dialogue that involves conflict or threats
New Auto-Interp
Negative Logits
edl
-0.15
"$
-0.14
chal
-0.13
uien
-0.13
emm
-0.13
otor
-0.13
character
-0.13
ettel
-0.13
_TestCase
-0.13
"`
-0.12
POSITIVE LOGITS
alian
0.17
978
0.16
âĢº
0.14
kre
0.14
êµ°
0.13
ilder
0.13
roma
0.13
fait
0.13
ži
0.12
-Compatible
0.12
Activations Density 0.928%