INDEX
Negative Logits
varsa
-0.07
hobbies
-0.07
erwähnt
-0.07
ingers
-0.07
resonate
-0.07
.Setter
-0.07
>,↵
-0.07
взаимодейств
-0.07
avoid
-0.07
isempty
-0.07
POSITIVE LOGITS
fooled
0.11
looph
0.11
fraudulent
0.10
quantità
0.10
counterfeit
0.10
deceptive
0.10
dishonest
0.10
deceit
0.10
misleading
0.10
claimed
0.09
Activations Density 0.022%