INDEX
Negative Logits
eps
-0.09
depart
-0.08
comparing
-0.08
compared
-0.08
attendants
-0.08
accompany
-0.08
伴
-0.08
ajj
-0.08
accompanies
-0.08
accompagner
-0.08
POSITIVE LOGITS
interfering
0.08
prohibited
0.08
interfere
0.08
.proto
0.08
interference
0.08
nib
0.08
PRO
0.07
troublesome
0.07
/template
0.07
misuse
0.07
Activations Density 0.005%