INDEX
Negative Logits
Citizen
-0.07
amateur
-0.07
moderator
-0.07
49
-0.07
(compact
-0.07
Test
-0.06
択
-0.06
surrogate
-0.06
reporters
-0.06
Constructed
-0.06
POSITIVE LOGITS
spanking
0.07
ška
0.07
blender
0.07
intimidate
0.07
期間
0.07
KE
0.06
firepower
0.06
ắn
0.06
seviy
0.06
appropri
0.06
Activations Density 0.004%