INDEX
Explanations
phrases indicating difficulty or struggle
New Auto-Interp
Negative Logits
akin
-0.20
åĬĽçļĦ
-0.14
hei
-0.13
Tai
-0.13
nen
-0.13
ÐĬ
-0.13
nada
-0.12
pecific
-0.12
suz
-0.12
Each
-0.12
POSITIVE LOGITS
many
0.33
many
0.24
us
0.23
both
0.23
anyone
0.23
everyone
0.22
MANY
0.21
sure
0.21
Many
0.20
most
0.20
Activations Density 0.171%