INDEX
Explanations
specific tokens that repeat in various contexts
New Auto-Interp
Negative Logits
myſelf
-1.41
itſelf
-1.35
Theſe
-1.30
purpoſe
-1.27
ſeveral
-1.26
faſt
-1.25
perſon
-1.22
ſever
-1.22
Monfieur
-1.21
ſelf
-1.21
POSITIVE LOGITS
h
1.78
r
1.73
b
1.70
p
1.67
m
1.66
c
1.64
d
1.61
s
1.59
k
1.58
f
1.58
Activations Density 0.405%