INDEX
Explanations
phrases indicating curiosity or doubt
expressions of curiosity or contemplation
New Auto-Interp
Negative Logits
onica
-0.56
rites
-0.55
ONG
-0.55
itual
-0.55
oided
-0.54
ortunately
-0.54
orest
-0.53
legates
-0.51
uctions
-0.51
absor
-0.50
POSITIVE LOGITS
aloud
1.42
why
1.41
whether
1.31
why
1.29
WHY
1.29
how
1.25
if
1.12
whether
1.09
HOW
1.06
how
1.02
Activations Density 0.043%