INDEX
Explanations
instances of the word "but," signaling contrasts or exceptions in discourse
New Auto-Interp
Negative Logits
/memory
-0.16
κÏĮ
-0.15
Ïĩο
-0.14
acs
-0.14
ourselves
-0.14
ızı
-0.14
pts
-0.14
elin
-0.13
.sax
-0.13
구
-0.13
POSITIVE LOGITS
LER
0.15
forth
0.14
że
0.14
lbrace
0.14
/www
0.14
ipeg
0.14
oldt
0.14
tery
0.14
ERG
0.14
adan
0.14
Activations Density 0.134%