INDEX
Explanations
phrases that indicate relationships, particularly in the context of implementation and effect
New Auto-Interp
Negative Logits
Cæsar
-0.82
Athenians
-0.72
Huguen
-0.71
Hadrian
-0.70
Mahomet
-0.69
Hopf
-0.68
Majefty
-0.66
Assyrian
-0.65
Phry
-0.65
Thebes
-0.64
POSITIVE LOGITS
the
1.58
a
1.10
these
1.05
)";
1.05
their
1.05
our
1.03
.}(
1.03
both
1.01
"):
1.00
an
0.96
Activations Density 8.062%