INDEX
Explanations
the word "but," indicating a contrast or exception in the text
New Auto-Interp
Negative Logits
ught
-0.17
ën
-0.16
iah
-0.15
ullan
-0.14
ption
-0.14
ziel
-0.14
urs
-0.13
-0.13
اذا
-0.13
him
-0.13
POSITIVE LOGITS
wait
0.23
tery
0.20
cher
0.20
Wait
0.19
tach
0.18
wait
0.17
WAIT
0.16
tern
0.16
Wait
0.16
why
0.16
Activations Density 0.085%