INDEX
Explanations
whether or not questions
phrases questioning the validity of certain statements or conditions
New Auto-Interp
Negative Logits
ĸļ
-0.86
äºĶ
-0.83
¸
-0.79
comings
-0.77
SourceFile
-0.76
srfAttach
-0.72
eway
-0.72
083
-0.71
undrum
-0.70
etter
-0.69
POSITIVE LOGITS
they
0.94
you
0.86
there
0.84
someone
0.80
we
0.80
anyone
0.78
qualifies
0.75
technically
0.75
he
0.74
somebody
0.72
Activations Density 0.028%