INDEX
Explanations
repeated sequences of underscores or similar patterns
New Auto-Interp
Negative Logits
a
-0.69
I
-0.59
my
-0.58
it
-0.57
the
-0.56
all
-0.56
an
-0.55
S
-0.53
at
-0.53
i
-0.53
POSITIVE LOGITS
pleaſure
1.50
purpoſe
1.47
Monfieur
1.47
мәкал
1.46
itſelf
1.42
Majefty
1.39
themſelves
1.38
Reſ
1.37
raiſ
1.36
ſche
1.33
Activations Density 0.936%