INDEX
Explanations
comparisons in performance or rankings
New Auto-Interp
Negative Logits
notwithstanding
-0.63
********************************
-0.54
WATCHED
-0.54
reperto
-0.54
meet
-0.53
¯¯¯¯¯¯¯¯
-0.52
ASAP
-0.52
word
-0.52
although
-0.51
Dial
-0.51
POSITIVE LOGITS
others
0.95
ours
0.92
elsewhere
0.87
theirs
0.85
actual
0.83
other
0.82
yours
0.78
hers
0.76
rest
0.71
actual
0.70
Activations Density 2.711%