INDEX
Explanations
durations and time-related phrases
New Auto-Interp
Negative Logits
either
-0.16
EITHER
-0.16
Norris
-0.16
plain
-0.15
asso
-0.14
610
-0.14
699
-0.14
боÑĢ
-0.14
ensi
-0.14
91
-0.13
POSITIVE LOGITS
longer
0.36
Longer
0.30
lifetime
0.18
Lifetime
0.18
lifetime
0.18
shorter
0.18
ä¹ĥ
0.17
longest
0.17
tháºŃm
0.17
lif
0.16
Activations Density 0.085%