INDEX
Explanations
punctuation marks, specifically periods and quotations
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.05
3:0.15
4:0.09
5:0.05
6:0.02
7:0.05
8:0.06
9:0.12
10:0.15
11:0.16
Negative Logits
decre
-1.50
Baal
-1.47
incre
-1.42
Britann
-1.40
forb
-1.40
Hath
-1.39
tram
-1.39
ashtra
-1.37
deleg
-1.36
rule
-1.34
POSITIVE LOGITS
CNN
1.47
cerpt
1.44
cryptocurrencies
1.35
·
1.35
yrics
1.35
BD
1.30
usional
1.30
Transcript
1.29
trolling
1.28
onyms
1.27
Activations Density 0.026%