INDEX
Explanations
instances of bullying and related experiences or themes
New Auto-Interp
Negative Logits
ocaly
-0.16
POSE
-0.15
adio
-0.15
prostit
-0.14
prostitute
-0.14
bab
-0.14
treason
-0.14
oice
-0.14
Accessory
-0.14
UTO
-0.13
POSITIVE LOGITS
bullying
0.40
bully
0.35
bul
0.32
Bul
0.32
bullied
0.32
peer
0.28
bull
0.26
Peer
0.26
peers
0.25
Peer
0.24
Activations Density 0.043%