INDEX
Negative Logits
yd
-0.07
ossed
-0.07
s
-0.07
hype
-0.07
d
-0.07
4
-0.07
med
-0.06
read
-0.06
y
-0.06
3
-0.06
POSITIVE LOGITS
cannot
0.14
cannot
0.12
Cannot
0.12
Cannot
0.10
ANNOT
0.10
not
0.09
amon
0.09
NOT
0.08
_CANNOT
0.08
annot
0.08
Activations Density 0.015%