INDEX
Explanations
instances of the word "instead" and its variations, indicating a preference for alternatives or changes in perspective
New Auto-Interp
Negative Logits
"));
-0.94
-0.84
GAO
-0.79
Schengen
-0.79
"));
-0.78
."));
-0.76
."),
-0.76
Rik
-0.76
SAE
-0.76
protoimpl
-0.75
POSITIVE LOGITS
Instead
1.23
Instead
1.19
instead
1.08
instead
0.96
uttosto
0.88
Rather
0.84
Rather
0.82
Statt
0.75
katapos
0.74
rather
0.73
Activations Density 0.156%