INDEX
Explanations
common articles and determiners in sentences
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.12
3:0.07
4:0.16
5:0.02
6:0.15
7:0.19
8:0.05
9:0.04
10:0.04
11:0.08
Negative Logits
fram
-1.40
intentions
-1.34
zoning
-1.31
croft
-1.28
pressed
-1.27
Tsukuyomi
-1.26
restraining
-1.25
navigating
-1.25
pressing
-1.25
maneu
-1.25
POSITIVE LOGITS
ascus
1.67
エル
1.66
)=(
1.55
REE
1.48
fruition
1.47
ワ
1.44
renches
1.42
strate
1.33
sale
1.28
ursday
1.24
Activations Density 0.001%