INDEX
Explanations
references to bullying and its consequences
New Auto-Interp
Negative Logits
adio
-0.19
ap
-0.17
mistr
-0.15
misc
-0.15
plits
-0.15
beds
-0.15
ieg
-0.14
aley
-0.14
oppel
-0.14
fid
-0.14
POSITIVE LOGITS
workplace
0.19
Workplace
0.18
Reporting
0.17
teri
0.17
bullying
0.17
bul
0.16
Bul
0.16
victim
0.16
Reporting
0.16
bully
0.15
Activations Density 0.018%