INDEX
Explanations
phrases emphasizing comparisons or similarities using the word "as."
New Auto-Interp
Negative Logits
pleaſure
-1.01
ſta
-1.00
raiſ
-0.99
Houſe
-0.98
Conſ
-0.98
houſe
-0.93
Jefus
-0.92
ſever
-0.91
itſelf
-0.90
ſche
-0.90
POSITIVE LOGITS
as
1.55
As
1.32
AS
1.20
As
1.15
readAs
1.03
a
0.92
as
0.91
an
0.82
part
0.78
ως
0.77
Activations Density 1.773%