INDEX
Explanations
concepts related to decision-making and agency
Preceding a contrastive conjunction
New Auto-Interp
Negative Logits
UnusedPrivate
-0.61
исленность
-0.59
λών
-0.54
pinulongan
-0.53
насељу
-0.53
pedes
-0.53
urman
-0.53
falen
-0.52
κης
-0.52
epik
-0.51
POSITIVE LOGITS
بلکه
1.81
而是
1.54
sondern
1.52
melainkan
1.24
אלא
1.13
vaan
1.10
hanem
0.99
Rather
0.81
Instead
0.76
necessarily
0.74
Activations Density 0.308%