INDEX
Negative Logits
spion
-0.98
frau
-0.96
underpin
-0.93
psychiat
-0.93
Márquez
-0.92
sezonu
-0.91
horloge
-0.91
bombar
-0.91
绡
-0.90
Insights
-0.90
POSITIVE LOGITS
out
1.76
up
1.67
through
1.63
it
1.52
with
1.49
on
1.47
from
1.29
backwards
1.23
Working
1.23
things
1.20
Activations Density 0.025%