INDEX
Explanations
narratives of criminal activities or violent actions
New Auto-Interp
Negative Logits
merce
-0.93
ratulations
-0.89
cule
-0.82
video
-0.81
monary
-0.81
endiary
-0.80
perty
-0.77
idav
-0.75
xual
-0.74
itudinal
-0.71
POSITIVE LOGITS
afar
1.19
whence
1.16
scratch
1.00
abroad
0.99
thence
0.94
somewhere
0.83
anywhere
0.81
Palest
0.75
underneath
0.75
beneath
0.71
Activations Density 0.177%