INDEX
Explanations
modal verbs indicating preference or intention
New Auto-Interp
Negative Logits
ittest
-0.18
flix
-0.18
evin
-0.15
itm
-0.15
duk
-0.14
ayne
-0.14
rane
-0.14
ittle
-0.14
ãģĴ
-0.14
spÄĽ
-0.14
POSITIVE LOGITS
nt
0.21
say
0.19
rather
0.19
bet
0.17
bets
0.17
personally
0.16
.python
0.16
Bet
0.15
wager
0.15
Rather
0.15
Activations Density 0.046%