INDEX
Explanations
punctuation and common connectors in the text
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.10
3:0.07
4:0.28
5:0.03
6:0.14
7:0.09
8:0.04
9:0.03
10:0.05
11:0.06
Negative Logits
capacity
-1.51
stood
-1.51
itte
-1.43
rank
-1.43
capacities
-1.34
purposes
-1.31
nude
-1.28
similarity
-1.27
?)
-1.27
SAP
-1.27
POSITIVE LOGITS
�醒
1.69
exting
1.61
Debor
1.58
Zar
1.54
Sac
1.52
Conquer
1.51
Grant
1.47
Guth
1.46
Frie
1.45
tyr
1.42
Activations Density 0.001%