INDEX
Negative Logits
ibr
-0.75
Tweet
-0.70
akra
-0.65
jar
-0.64
je
-0.64
deb
-0.63
paper
-0.63
umar
-0.63
tal
-0.63
ib
-0.63
POSITIVE LOGITS
all
1.47
all
1.38
ALL
1.38
All
1.20
All
1.18
ALL
1.13
none
1.06
everything
1.05
entirety
1.01
EVERY
1.00
Activations Density 0.260%