INDEX
Explanations
the word "while" or its variations, indicating temporal transitions or contrasts in text
New Auto-Interp
Negative Logits
erap
-0.17
but
-0.17
też
-0.16
another
-0.15
whereas
-0.15
orsi
-0.15
oise
-0.15
ancak
-0.14
hatta
-0.14
åı¦å¤ĸ
-0.14
POSITIVE LOGITS
initially
0.22
yes
0.20
s
0.20
yes
0.19
Initially
0.17
certainly
0.17
most
0.17
not
0.17
Initially
0.17
there
0.16
Activations Density 0.023%