INDEX
Explanations
mentions of lethal or dangerous items, actions, or situations
references to lethal substances or actions
New Auto-Interp
Negative Logits
oard
-0.83
ilage
-0.81
aliation
-0.76
resses
-0.75
bed
-0.72
æĦ
-0.72
roller
-0.72
roll
-0.72
baby
-0.71
acular
-0.69
POSITIVE LOGITS
cious
0.94
istics
0.93
uania
0.84
cius
0.80
lihood
0.75
SPONSORED
0.74
Luthor
0.74
istically
0.73
injection
0.72
ANGEL
0.71
Activations Density 0.035%