INDEX
Explanations
attends to injury-related tokens from explanations or clarifications present in tokens prior in the sequence
New Auto-Interp
Head Attr Weights
0:0.18
1:0.19
2:0.17
3:0.09
4:0.08
5:0.09
6:0.05
7:0.12
Negative Logits
ویکیپدیا
-0.36
rungsseite
-0.35
########.
-0.35
Rüyada
-0.34
'\\;'
-0.34
ſte
-0.33
ſta
-0.32
HideFlags
-0.32
SharedCtor
-0.32
Вес
-0.32
POSITIVE LOGITS
的她
0.25
tagHelperRunner
0.24
legd
0.24
ngang
0.23
awtextra
0.23
künftig
0.23
Kao
0.23
的他
0.23
Oviedo
0.22
gdyby
0.22
Activations Density 0.075%