INDEX
Explanations
phrases indicating intent, action, or directive purposes
New Auto-Interp
Negative Logits
artin
-0.15
pending
-0.14
fon
-0.14
赫
-0.14
raph
-0.14
Heb
-0.14
ursion
-0.13
successful
-0.13
rtl
-0.13
Ø·ÙĬ
-0.13
POSITIVE LOGITS
asts
0.20
feature
0.17
erea
0.16
host
0.16
be
0.15
feature
0.15
asting
0.15
Host
0.15
Feature
0.14
soon
0.14
Activations Density 0.026%