INDEX
Explanations
instances of emotional outbursts and tantrums
New Auto-Interp
Negative Logits
OUCH
-0.18
aland
-0.16
ainless
-0.15
wink
-0.15
gangbang
-0.15
pain
-0.14
è§
-0.14
osy
-0.14
habi
-0.14
painfully
-0.14
POSITIVE LOGITS
sul
0.29
tantr
0.29
stom
0.27
stamp
0.25
storm
0.25
stamped
0.22
vent
0.22
stormed
0.22
throwing
0.21
storm
0.21
Activations Density 0.283%