INDEX
Negative Logits
utter
-0.06
Attachment
-0.06
surround
-0.06
.Require
-0.06
admissions
-0.06
ifton
-0.06
щее
-0.06
Interracial
-0.06
brilliant
-0.06
арам
-0.06
POSITIVE LOGITS
Policy
0.08
ï
0.08
policy
0.08
ovic
0.07
.tax
0.07
превыш
0.07
hij
0.07
policy
0.07
impover
0.07
дап
0.07
Activations Density 0.005%