INDEX
Explanations
occurrences of the letter 'D' in various contexts
New Auto-Interp
Negative Logits
addy
-0.17
аÑİ
-0.17
elta
-0.16
ši
-0.16
awn
-0.16
emo
-0.15
ayo
-0.15
OWN
-0.15
ansa
-0.15
ÑĢÑĥг
-0.15
POSITIVE LOGITS
yer
0.25
zure
0.24
lug
0.24
rey
0.24
ill
0.23
alg
0.22
oh
0.22
zial
0.22
uf
0.21
sou
0.21
Activations Density 0.028%