INDEX
Explanations
punctuation marks and their frequency in the text
New Auto-Interp
Negative Logits
1
-0.21
[
-0.21
x
-0.20
2
-0.20
4
-0.20
3
-0.20
10
-0.20
50
-0.20
.
-0.19
5
-0.19
POSITIVE LOGITS
J
0.35
L
0.34
M
0.34
C
0.33
E
0.33
R
0.32
S
0.32
D
0.32
G
0.32
F
0.32
Activations Density 0.020%