INDEX
Explanations
the words "red" and "blue", likely in the context of code or data
New Auto-Interp
Negative Logits
-1.06
,
-1.05
in
-1.03
-1.02
(
-1.02
I
-0.93
all
-0.91
a
-0.90
the
-0.89
-
-0.89
POSITIVE LOGITS
Monfieur
2.45
Theſe
2.19
Efq
2.14
itſelf
2.08
Majefty
2.06
Jefus
2.03
myſelf
2.02
Houſe
1.99
auffi
1.88
Anſ
1.88
Activations Density 5.770%