INDEX
Explanations
articles and prepositions in the text
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.11
3:0.06
4:0.12
5:0.02
6:0.39
7:0.07
8:0.04
9:0.02
10:0.04
11:0.03
Negative Logits
wagen
-1.36
装
-1.28
��
-1.28
Leader
-1.25
veland
-1.20
ONSORED
-1.20
Salmon
-1.17
ゴン
-1.16
Tig
-1.16
rede
-1.15
POSITIVE LOGITS
antics
1.75
angu
1.45
ventions
1.42
legal
1.39
votes
1.37
Citiz
1.37
intention
1.35
respons
1.33
mercial
1.31
eties
1.31
Activations Density 0.006%