INDEX
Explanations
negative phrases related to abuse, misconduct, and the protection of people in vulnerable situations
New Auto-Interp
Negative Logits
hardly
-0.17
actually
-0.17
lus
-0.15
Ø£Ùĥ
-0.15
asons
-0.15
rất
-0.15
actually
-0.15
almost
-0.14
quite
-0.14
nearly
-0.14
POSITIVE LOGITS
EVER
0.18
ictim
0.17
cave
0.17
lightly
0.17
succ
0.17
imbus
0.16
politic
0.16
cherry
0.15
AGAIN
0.15
ukkit
0.15
Activations Density 0.196%