INDEX
Explanations
instances of the word "to" indicating intentions or actions
New Auto-Interp
Negative Logits
ombs
-0.17
ijd
-0.14
hor
-0.14
gain
-0.13
برÛĮ
-0.13
ongo
-0.13
esin
-0.13
_SECURITY
-0.13
ije
-0.13
ivan
-0.13
POSITIVE LOGITS
balance
0.19
artz
0.15
antz
0.15
çĿ
0.15
Balance
0.14
Hammer
0.14
uggle
0.14
;
0.14
(
0.14
som
0.14
Activations Density 0.036%