INDEX
Explanations
articles or definite articles in the text
New Auto-Interp
Head Attr Weights
0:0.07
1:0.08
2:0.08
3:0.08
4:0.07
5:0.07
6:0.08
7:0.07
8:0.07
9:0.10
10:0.09
11:0.08
Negative Logits
rahim
-2.42
ワン
-2.34
ullivan
-2.25
イト
-2.22
裏�
-2.17
igl
-2.16
ategories
-2.14
hari
-2.10
PLA
-2.09
akedown
-2.07
POSITIVE LOGITS
froze
2.22
revoked
2.18
skipped
2.10
discount
2.06
emitted
2.00
trailed
2.00
crunch
1.98
drifted
1.97
tend
1.97
tended
1.95
Activations Density 0.000%