INDEX
Explanations
graphic descriptions of violence and bodily harm
New Auto-Interp
Negative Logits
tings
-0.15
unm
-0.15
azine
-0.15
ngu
-0.15
Bang
-0.15
Adaptive
-0.14
ÃŃs
-0.14
iglia
-0.14
DERP
-0.14
bang
-0.14
POSITIVE LOGITS
avar
0.15
ammen
0.15
avers
0.14
cyan
0.14
;č↵
0.14
mercury
0.14
AVA
0.14
aned
0.13
avn
0.13
Greenwood
0.13
Activations Density 0.082%