INDEX
Explanations
instances of the letter 'p' in various contexts
New Auto-Interp
Negative Logits
arent
-0.17
pant
-0.16
KA
-0.16
ose
-0.15
sis
-0.15
ending
-0.15
SP
-0.15
ifton
-0.15
si
-0.15
vanity
-0.15
POSITIVE LOGITS
esto
0.23
ears
0.21
imiento
0.19
ome
0.19
еÑģÑĤо
0.18
anko
0.18
ast
0.18
umper
0.17
ean
0.17
imento
0.17
Activations Density 0.010%