INDEX
Explanations
phrases indicating uncertainty or the need for future evaluation
New Auto-Interp
Negative Logits
ogui
-0.17
ctl
-0.15
OKIE
-0.14
plnÄĽ
-0.14
ASSES
-0.14
iosa
-0.14
raud
-0.14
aleb
-0.13
Fra
-0.13
½
-0.13
POSITIVE LOGITS
whether
0.29
jury
0.28
Whether
0.26
Hopefully
0.25
hopefully
0.24
WHETHER
0.24
Hopefully
0.23
whether
0.23
Whether
0.22
hope
0.22
Activations Density 0.195%