INDEX
Explanations
conditional phrases that pose questions about various scenarios or choices
New Auto-Interp
Negative Logits
niet
-0.20
both
-0.20
neither
-0.20
nicht
-0.20
not
-0.19
tidak
-0.18
не
-0.17
ikke
-0.17
deÄŁil
-0.17
nejen
-0.17
POSITIVE LOGITS
whether
0.20
whether
0.18
Whether
0.18
ultimately
0.17
WHETHER
0.17
Whether
0.17
æĺ¯åIJ¦
0.16
zda
0.16
simple
0.15
simply
0.15
Activations Density 0.068%