INDEX
Explanations
references to the letter "Y" or its variations across different contexts
New Auto-Interp
Negative Logits
ieg
-0.16
iero
-0.15
ief
-0.15
leston
-0.15
390
-0.15
adece
-0.14
adors
-0.14
ÃŁe
-0.14
icher
-0.14
моÑģ
-0.14
POSITIVE LOGITS
achts
0.23
ea
0.23
eh
0.21
ves
0.21
ez
0.19
acht
0.19
ields
0.19
asmine
0.19
atra
0.19
psilon
0.19
Activations Density 0.042%