INDEX
Explanations
terms associated with violence and brutality
New Auto-Interp
Negative Logits
amd
-0.17
brush
-0.16
tra
-0.16
grund
-0.16
latter
-0.16
strav
-0.15
brute
-0.14
grace
-0.14
exion
-0.14
-thirds
-0.14
POSITIVE LOGITS
shaw
0.20
以为
0.19
zeitig
0.17
ly
0.17
ulent
0.16
acht
0.15
ummer
0.15
bite
0.15
uffles
0.15
imal
0.15
Activations Density 0.553%