INDEX
Explanations
questions or expressions of uncertainty
phrases that pose questions about understanding or explanations
New Auto-Interp
Negative Logits
Laughs
-0.72
otti
-0.68
OGR
-0.60
rive
-0.60
enza
-0.59
iva
-0.59
©¶æ¥µ
-0.58
ONG
-0.58
ey
-0.57
horn
-0.57
POSITIVE LOGITS
why
1.81
whether
1.61
WHY
1.59
how
1.49
why
1.42
whence
1.28
what
1.27
whether
1.21
HOW
1.14
whereabouts
1.12
Activations Density 0.174%