INDEX
Explanations
phrases indicating uncertainty or doubt followed by questions or inquiries
phrases that indicate questioning or uncertainty
New Auto-Interp
Negative Logits
otti
-0.70
DIV
-0.70
kee
-0.67
ONG
-0.65
velop
-0.65
IUM
-0.65
STDOUT
-0.64
ena
-0.62
ario
-0.61
iva
-0.61
POSITIVE LOGITS
whether
1.95
why
1.66
whether
1.46
WHY
1.35
how
1.33
why
1.29
whence
1.23
what
1.17
Whether
1.12
Whether
1.05
Activations Density 0.257%