INDEX
Explanations
phrases that emphasize the impact or significance of specific actions or events
New Auto-Interp
Negative Logits
Cæsar
-0.77
uſed
-0.71
himſelf
-0.67
pleaſure
-0.66
purpoſe
-0.65
raiſ
-0.65
Atiku
-0.64
eiffel
-0.64
fevere
-0.63
Addis
-0.62
POSITIVE LOGITS
the
1.23
OF
1.10
.}(
1.01
Of
1.01
.)}
0.95
a
0.93
their
0.91
sorts
0.90
MessageOf
0.88
our
0.88
Activations Density 1.430%