INDEX
Explanations
statements or claims being questioned or challenged
instances of the word "that."
New Auto-Interp
Negative Logits
pione
-0.87
ãĤĬ
-0.86
oufl
-0.82
izont
-0.81
umat
-0.81
ãĥīãĥ©
-0.79
å§«
-0.79
arest
-0.78
ãĤ´ãĥ³
-0.78
ãĥĥ
-0.78
POSITIVE LOGITS
there
0.96
they
0.95
possibility
0.92
aspect
0.90
notion
0.86
distinction
0.85
fact
0.81
we
0.81
assertion
0.80
phrase
0.79
Activations Density 0.188%