INDEX
Explanations
connections and relationships in reasoning or explanations
New Auto-Interp
Negative Logits
myſelf
-1.03
purpoſe
-1.02
fubject
-1.00
Shakspeare
-0.98
ſtate
-0.96
Reſ
-0.94
Majefty
-0.93
itſelf
-0.91
greateſt
-0.90
ſever
-0.89
POSITIVE LOGITS
the
0.89
a
0.77
:
0.70
an
0.69
several
0.63
adanya
0.57
two
0.56
faptul
0.56
that
0.56
some
0.56
Activations Density 0.732%