INDEX
Explanations
instances of confusion or uncertainty related to various topics
New Auto-Interp
Negative Logits
unya
-0.16
tak
-0.16
Mirage
-0.15
zos
-0.15
ughs
-0.15
lé
-0.14
ãĥ³ãĤ¹
-0.14
irst
-0.14
manship
-0.14
Wy
-0.14
POSITIVE LOGITS
/conf
0.32
confusion
0.27
confuse
0.25
confusing
0.23
confused
0.21
Conf
0.18
ingly
0.18
conf
0.17
ly
0.16
Conf
0.16
Activations Density 0.039%