INDEX
Explanations
instances of inappropriate sexual relationships or misconduct involving teachers and students
New Auto-Interp
Negative Logits
æķ·
-0.17
undi
-0.15
hetto
-0.15
бÑĥ
-0.14
estruct
-0.14
ontrol
-0.14
ouston
-0.14
shiv
-0.14
baiser
-0.14
_embed
-0.14
POSITIVE LOGITS
oun
0.17
harm
0.15
aser
0.15
ouns
0.14
407
0.14
951
0.14
vik
0.13
ÙħاÙħ
0.13
è¡
0.13
Ïĥκε
0.13
Activations Density 0.234%