INDEX
Explanations
punctuation marks, specifically periods, indicating the end of sentences
New Auto-Interp
Head Attr Weights
0:0.24
1:0.10
2:0.04
3:0.05
4:0.03
5:0.07
6:0.02
7:0.04
8:0.10
9:0.09
10:0.08
11:0.09
Negative Logits
pse
-1.62
ILCS
-1.55
thumbnails
-1.43
�
-1.38
tremend
-1.35
�
-1.35
asers
-1.32
ataka
-1.29
══
-1.28
!/
-1.28
POSITIVE LOGITS
admits
1.55
↵
1.55
Nug
1.54
Media
1.52
<|endoftext|>
1.50
uphem
1.46
laughs
1.42
Guest
1.37
Interview
1.37
Advisor
1.36
Activations Density 0.016%