INDEX
Explanations
references to decision-making and responses related to questions and engagement
New Auto-Interp
Negative Logits
tvguidetime
-0.92
Efq
-0.87
виправивши
-0.84
ientôt
-0.83
archiviato
-0.81
་་
-0.81
$:$
-0.81
Jefus
-0.80
Theſe
-0.80
.")]
-0.79
POSITIVE LOGITS
,
0.50
(
0.50
then
0.49
and
0.48
.
0.48
and
0.47
<
0.47
R
0.46
then
0.46
And
0.46
Activations Density 0.523%