INDEX
Explanations
the word "instead" and its variations, indicating a preference for alternatives or shifts in perspective
New Auto-Interp
Negative Logits
"));
-0.78
StatusOK
-0.74
Cahill
-0.73
Hollis
-0.73
Kass
-0.69
"));
-0.67
Bil
-0.67
er
-0.67
Rik
-0.66
`{.-0.66
POSITIVE LOGITS
Instead
1.84
Instead
1.82
instead
1.78
instead
1.70
Rather
1.23
uttosto
1.22
Rather
1.17
rather
1.08
tdessen
1.05
Statt
1.04
Activations Density 0.180%