INDEX
Explanations
uncertainty or lack of clarity in statements
expressions of uncertainty or ambiguity
New Auto-Interp
Negative Logits
nov
-0.67
Zone
-0.65
Exper
-0.63
bern
-0.62
Minor
-0.61
../
-0.60
nova
-0.60
INT
-0.58
throats
-0.57
upiter
-0.57
POSITIVE LOGITS
whether
1.38
why
1.19
how
1.16
whether
1.04
exactly
1.01
WHY
0.93
why
0.93
HOW
0.88
if
0.87
Whether
0.85
Activations Density 0.058%