INDEX
Explanations
instances of the word "to" and its various forms, suggesting a focus on infinitive verbs or directions
New Auto-Interp
Negative Logits
Ñıг
-0.17
843
-0.16
\App
-0.15
ãĥ¼ãĥ³
-0.15
elters
-0.15
839
-0.15
218
-0.15
bew
-0.15
STANCE
-0.14
radient
-0.14
POSITIVE LOGITS
sometimes
0.16
YN
0.15
maybe
0.15
idata
0.15
fund
0.15
ads
0.14
average
0.14
ogan
0.14
WF
0.14
Correction
0.14
Activations Density 0.040%