INDEX
Explanations
interactions involving requests or questions directed at individuals
New Auto-Interp
Negative Logits
oola
-0.16
viso
-0.15
orget
-0.15
asmus
-0.15
uve
-0.15
oky
-0.15
imers
-0.15
abble
-0.14
ece
-0.14
ฤ
-0.14
POSITIVE LOGITS
questions
0.32
whether
0.32
about
0.31
why
0.31
how
0.29
what
0.26
Questions
0.24
why
0.23
whether
0.23
questions
0.23
Activations Density 0.037%