INDEX
Explanations
phrases related to actions of self-harm or harm towards others
references to self-harm and suicide
New Auto-Interp
Negative Logits
taboola
-0.86
soType
-0.77
eret
-0.74
Management
-0.72
Wide
-0.72
å§«
-0.71
issance
-0.71
DragonMagazine
-0.70
UMP
-0.69
FML
-0.68
POSITIVE LOGITS
unborn
0.90
whales
0.75
innocent
0.74
hated
0.74
terrorists
0.73
classmate
0.73
Mum
0.71
chickens
0.71
birds
0.70
zombies
0.70
Activations Density 0.183%