INDEX
Negative Logits
largely
-0.08
depiction
-0.08
homeland
-0.08
produkt
-0.08
hơn
-0.08
迹
-0.07
Harrison
-0.07
uninterrupted
-0.07
Freedom
-0.07
Rhe
-0.07
POSITIVE LOGITS
symmetry
0.08
اتف
0.08
symmetric
0.08
Swap
0.08
swapping
0.08
statutes
0.08
0.07
flip
0.07
exch
0.07
_flip
0.07
Activations Density 0.027%