INDEX
Explanations
occurrences of specific capital letters, likely referring to acronyms or titles
New Auto-Interp
Negative Logits
Anſ
-1.72
Theſe
-1.66
Diſ
-1.59
faſt
-1.53
Reſ
-1.52
Houſe
-1.52
Monfieur
-1.48
Beſ
-1.46
itſelf
-1.46
ſeveral
-1.45
POSITIVE LOGITS
G
2.11
M
2.10
D
2.09
P
2.08
S
2.07
B
2.05
R
2.04
L
2.04
C
2.03
W
2.03
Activations Density 0.571%