INDEX
Explanations
references to time periods and changes over time
New Auto-Interp
Negative Logits
yet
-0.21
yet
-0.19
due
-0.19
Yet
-0.18
due
-0.18
ERM
-0.17
Yet
-0.16
eya
-0.16
anie
-0.15
compared
-0.15
POSITIVE LOGITS
though
0.29
however
0.28
though
0.21
Though
0.20
Though
0.19
denn
0.19
because
0.18
aber
0.18
however
0.18
Because
0.17
Activations Density 0.022%