INDEX
Negative Logits
homosex
-0.07
intimidated
-0.07
SPORT
-0.06
cstdint
-0.06
_requested
-0.06
Jehovah
-0.06
馬
-0.06
_density
-0.06
american
-0.06
Spin
-0.06
POSITIVE LOGITS
.Man
0.07
carrot
0.07
occurring
0.07
="/
0.06
lời
0.06
CodeGen
0.06
.optional
0.06
izen
0.06
hattan
0.06
affen
0.06
Activations Density 0.035%