INDEX
Explanations
verbs and actions that indicate decision-making or personal agency
New Auto-Interp
Negative Logits
lg
-0.51
lu
-0.51
FROM
-0.50
more
-0.49
From
-0.48
uns
-0.48
Between
-0.47
랜드
-0.47
无
-0.47
rs
-0.46
POSITIVE LOGITS
CreateTagHelper
1.07
"]];
0.94
)");
0.91
#+#
0.87
"}>
0.84
itſelf
0.84
("")]
0.83
"])
0.83
RenderAtEndOf
0.81
)";
0.81
Activations Density 0.312%