INDEX
Explanations
instances of the word "twice" or its variations in text
New Auto-Interp
Negative Logits
reg
-0.17
oa
-0.15
rell
-0.15
.uc
-0.15
hausen
-0.15
ον
-0.14
eg
-0.14
och
-0.14
note
-0.14
ожд
-0.14
POSITIVE LOGITS
/th
0.20
-thirds
0.19
dozen
0.17
ër
0.16
krom
0.16
/single
0.15
ldkf
0.15
-week
0.15
abal
0.15
idlo
0.15
Activations Density 0.011%