INDEX
Explanations
instances of the word "the"
New Auto-Interp
Head Attr Weights
0:0.06
1:0.01
2:0.12
3:0.12
4:0.03
5:0.12
6:0.05
7:0.07
8:0.07
9:0.08
10:0.14
11:0.07
Negative Logits
��
-1.41
reach
-1.31
speak
-1.19
alloc
-1.13
antis
-1.07
�
-1.03
acts
-0.99
�
-0.98
�
-0.96
aband
-0.95
POSITIVE LOGITS
liest
1.40
iest
1.27
Authors
1.25
architect
1.15
heirs
1.13
cients
1.10
oldest
1.03
nearest
1.02
portion
1.01
architects
0.99
Activations Density 0.267%