INDEX
Explanations
instances of the word "too" indicating excess or degree
New Auto-Interp
Negative Logits
nÃło
-0.19
toch
-0.18
ron
-0.17
st
-0.16
reat
-0.15
cer
-0.15
mp
-0.15
wow
-0.14
ando
-0.14
alone
-0.14
POSITIVE LOGITS
ledo
0.22
led
0.21
thers
0.16
eker
0.16
kest
0.16
gether
0.15
eten
0.15
/from
0.15
kees
0.15
ãģ£ãģ¨
0.15
Activations Density 0.036%