INDEX
Explanations
verbs indicating movement or action
New Auto-Interp
Head Attr Weights
0:0.08
1:0.09
2:0.09
3:0.08
4:0.08
5:0.07
6:0.07
7:0.07
8:0.07
9:0.08
10:0.08
11:0.08
Negative Logits
Explain
-2.18
eur
-2.13
oided
-2.11
ullivan
-2.11
Lawyers
-2.10
authorized
-2.08
ioned
-1.97
endas
-1.96
ayson
-1.94
inge
-1.92
POSITIVE LOGITS
Jou
2.24
Afee
2.12
gradient
1.98
Band
1.98
ドラ
1.95
proport
1.92
Tes
1.91
perty
1.91
spacing
1.91
tens
1.88
Activations Density 0.000%