INDEX
Negative Logits
keep
-0.07
בטיח
-0.07
ග
-0.07
쐉
-0.06
In
-0.06
might
-0.06
ETING
-0.06
pool
-0.06
炉
-0.06
Gold
-0.06
POSITIVE LOGITS
unjust
0.08
Impossible
0.08
airborne
0.08
_scores
0.08
crumbling
0.08
Nazi
0.08
fraudulent
0.08
propositions
0.07
cosas
0.07
mostr
0.07
Activations Density 0.003%