INDEX
Explanations
words related to misconduct or inappropriate behavior
variations of the word "behavior."
New Auto-Interp
Negative Logits
Rwanda
-0.74
Korean
-0.71
Panthers
-0.69
Panther
-0.68
Nordic
-0.66
Panzer
-0.66
Purg
-0.64
ãģĤ
-0.64
Kinnikuman
-0.64
Roof
-0.63
POSITIVE LOGITS
beh
1.45
aviour
1.29
Beh
0.96
terness
0.89
behavior
0.89
avin
0.89
Beh
0.84
abus
0.83
behav
0.81
ilib
0.81
Activations Density 0.007%