INDEX
Explanations
phrases indicating relational or causal connections
New Auto-Interp
Negative Logits
itſelf
-1.05
purpoſe
-1.03
ſtate
-1.01
myſelf
-0.99
houſe
-0.97
ſche
-0.95
pleaſure
-0.91
coö
-0.89
Anſ
-0.87
Houſe
-0.86
POSITIVE LOGITS
the
1.39
a
0.99
an
0.98
<>();
0.75
their
0.72
them
0.72
these
0.72
some
0.71
:
0.71
several
0.68
Activations Density 1.448%