INDEX
Explanations
questions and phrases related to understanding or defining concepts
Definition/explanation seeking
questions about meaning
New Auto-Interp
Negative Logits
Efq
-1.11
Shakspeare
-1.09
ſche
-1.06
ſtate
-1.04
reaſon
-1.04
ſmall
-1.03
Theſe
-1.02
greateſt
-1.01
Eſ
-1.01
ſever
-1.00
POSITIVE LOGITS
"
0.73
meant
0.67
'
0.65
mean
0.59
to
0.58
p
0.57
“
0.56
«
0.55
0.54
a
0.52
Activations Density 0.135%