INDEX
Negative Logits
PRESS
-0.06
truthful
-0.06
907
-0.06
|↵
-0.06
Selectors
-0.06
ass
-0.06
fold
-0.06
Families
-0.06
Expense
-0.06
Fast
-0.06
POSITIVE LOGITS
.Down
0.08
adverts
0.07
“She
0.07
epad
0.06
panels
0.06
Bullet
0.06
'y
0.06
Accent
0.06
’y
0.06
_marshaled
0.06
Activations Density 0.224%