INDEX
Explanations
instances of the word "surprisingly"
New Auto-Interp
Negative Logits
Kurt
-0.15
tha
-0.15
ge
-0.15
Middleton
-0.14
lec
-0.14
op
-0.14
hea
-0.13
etur
-0.13
upert
-0.13
vur
-0.13
POSITIVE LOGITS
echan
0.18
razil
0.17
æķĪ
0.17
umlu
0.16
çļĦå°ı
0.15
ẩm
0.15
achi
0.15
nia
0.15
šak
0.15
DBG
0.14
Activations Density 0.003%