INDEX
Explanations
occurrences of the letter 'D' in various forms
New Auto-Interp
Negative Logits
ocking
-0.19
avis
-0.18
addy
-0.18
elay
-0.17
ummy
-0.17
etermin
-0.17
oes
-0.16
emand
-0.16
airy
-0.16
rug
-0.16
POSITIVE LOGITS
alm
0.19
zer
0.18
ighton
0.16
jer
0.15
ahoo
0.14
alk
0.14
endale
0.14
ido
0.14
elf
0.14
end
0.14
Activations Density 0.047%