INDEX
Explanations
seemingly random numbers or formatting characters within text
New Auto-Interp
Negative Logits
↵
-0.63
-0.55
(
-0.52
-0.52
V
-0.50
-0.50
-0.50
-0.50
MethodImpl
-0.49
B
-0.48
POSITIVE LOGITS
myſelf
0.84
purpoſe
0.82
faſt
0.80
itſelf
0.77
ſche
0.77
<>",
0.77
Monfieur
0.76
juſ
0.74
ſtate
0.74
raiſ
0.73
Activations Density 1.624%