INDEX
Explanations
phrases indicating doubt or uncertainty
phrases indicating uncertainty or doubt
New Auto-Interp
Negative Logits
士
-0.74
ipeg
-0.74
gencies
-0.73
ounters
-0.72
IRE
-0.72
tesy
-0.67
urses
-0.67
ridges
-0.65
hess
-0.64
licks
-0.64
POSITIVE LOGITS
sure
1.26
kidding
1.24
ashamed
1.21
joking
1.14
advocating
1.14
surprised
1.13
exagger
1.10
saying
1.06
complaining
1.04
suggesting
1.04
Activations Density 0.070%