INDEX
Explanations
instances of contrasting options or choices
contrasts or alternatives presented in a narrative
New Auto-Interp
Negative Logits
isable
-0.69
uble
-0.65
inguishable
-0.65
uming
-0.62
describ
-0.62
nat
-0.61
lesi
-0.60
onomy
-0.60
izable
-0.60
umed
-0.59
POSITIVE LOGITS
alas
1.12
opted
0.92
Instead
0.92
Instead
0.89
Nope
0.89
instead
0.87
chose
0.86
postponed
0.80
failed
0.75
instead
0.73
Activations Density 0.347%