INDEX
Negative Logits
(dist
-0.08
CATEGORY
-0.07
BOUND
-0.07
_required
-0.06
complications
-0.06
porch
-0.06
crossword
-0.06
dalla
-0.06
месте
-0.06
constructive
-0.06
POSITIVE LOGITS
naive
0.16
naï
0.13
paranoia
0.07
Bere
0.07
ignorance
0.07
paranoid
0.07
모르
0.07
菲
0.06
Naomi
0.06
taxpayers
0.06
Activations Density 0.002%