INDEX
Explanations
instances of the word "too" indicating excessiveness or intensity
New Auto-Interp
Negative Logits
st
-0.19
pu
-0.19
toch
-0.18
map
-0.18
tron
-0.17
do
-0.17
ry
-0.17
noon
-0.17
post
-0.16
termin
-0.16
POSITIVE LOGITS
led
0.28
boot
0.27
ledo
0.24
Boot
0.23
o
0.21
gether
0.21
thers
0.20
oot
0.20
oooooooo
0.20
oooo
0.19
Activations Density 0.022%