INDEX
Explanations
the letters "au" either as part of a word or on their own
occurrences of the substring "au"
New Auto-Interp
Negative Logits
selves
-0.63
Wallet
-0.63
flares
-0.62
APS
-0.61
appers
-0.61
STATE
-0.60
orative
-0.60
bread
-0.57
Io
-0.56
OB
-0.56
POSITIVE LOGITS
llah
1.24
gment
1.08
ction
1.04
qua
0.99
lette
0.97
fman
0.93
cel
0.90
ctions
0.88
clair
0.88
lly
0.86
Activations Density 0.020%