INDEX
Explanations
instances of phrases indicating uncertainty or possibility
phrases questioning the certainty or validity of statements or situations
New Auto-Interp
Negative Logits
rote
-0.80
ãĥ´ãĤ¡
-0.71
"},
-0.64
rite
-0.61
wash
-0.59
andem
-0.58
oos
-0.58
"+
-0.58
nox
-0.56
arie
-0.55
POSITIVE LOGITS
depends
1.43
varies
1.13
remains
1.12
depended
0.91
or
0.86
lies
0.75
hinges
0.73
Regardless
0.70
versus
0.68
differs
0.67
Activations Density 0.324%