INDEX
Explanations
phrases that express uncertainty, opinions, or speculative thoughts
New Auto-Interp
Negative Logits
opak
-0.17
.helpers
-0.15
µľ
-0.14
occasionally
-0.14
оказ
-0.14
omanip
-0.14
often
-0.14
izzle
-0.14
ometimes
-0.13
ä¸įå¾Ĺ
-0.13
POSITIVE LOGITS
expect
0.35
odds
0.29
unless
0.28
will
0.27
given
0.27
Odds
0.27
expecting
0.26
Expect
0.25
expectation
0.24
expects
0.24
Activations Density 0.385%