INDEX
Explanations
instances of the letter "s" or the possessive form "s'"
New Auto-Interp
Negative Logits
ngr
-0.18
cles
-0.17
ampa
-0.16
lse
-0.16
olas
-0.15
imest
-0.15
izr
-0.14
inx
-0.14
ERRU
-0.14
isay
-0.14
POSITIVE LOGITS
ãĥªãĤ¢
0.16
ere
0.15
Bench
0.15
ear
0.15
Bach
0.15
bern
0.14
ria
0.14
eler
0.14
iele
0.14
Ģ
0.14
Activations Density 0.028%