INDEX
Explanations
contrasts and comparisons
New Auto-Interp
Negative Logits
fw
-0.75
whe
-0.68
Limited
-0.66
iverse
-0.66
Became
-0.65
Published
-0.63
erity
-0.63
bard
-0.63
oji
-0.62
iol
-0.62
POSITIVE LOGITS
previous
1.04
traditional
1.02
typical
1.01
usual
0.98
conventional
0.96
counterparts
0.93
ours
0.89
predecessors
0.89
other
0.83
lihood
0.83
Activations Density 1.009%