INDEX
Explanations
questions generally related to uncertainty or seeking information
New Auto-Interp
Negative Logits
there
-0.22
THERE
-0.20
there
-0.19
sWith
-0.17
theres
-0.17
There
-0.16
cad
-0.16
amounts
-0.16
here
-0.15
version
-0.15
POSITIVE LOGITS
nt
0.28
/do
0.25
anyone
0.22
anybody
0.20
actic
0.19
't
0.17
ãĥ³ãĤ¿
0.17
’t
0.16
kommen
0.16
ñana
0.16
Activations Density 0.037%