INDEX
Explanations
instances of the word "beg" and its variations
New Auto-Interp
Negative Logits
eres
-0.17
aes
-0.16
ifers
-0.16
llib
-0.15
ifer
-0.15
apas
-0.15
ering
-0.15
rax
-0.14
ERING
-0.14
ANDOM
-0.14
POSITIVE LOGITS
gar
0.27
gars
0.24
Beg
0.21
otten
0.20
beg
0.20
gin
0.19
ging
0.19
begs
0.17
gs
0.17
ingroup
0.17
Activations Density 0.006%