INDEX
Explanations
attends to closing interrogative sentences marked by question marks from later tokens in the sentences
New Auto-Interp
Head Attr Weights
0:0.06
1:0.08
2:0.07
3:0.11
4:0.11
5:0.05
6:0.38
7:0.09
Negative Logits
Pautan
-0.41
Italijani
-0.41
лтамалар
-0.41
שוליים
-0.40
"..\..\..\
-0.40
متعلقه
-0.39
Waray
-0.38
liken
-0.38
Diweddarwch
-0.38
հղումներ
-0.37
POSITIVE LOGITS
Vordergrund
0.36
öny
0.34
Tikang
0.34
excru
0.32
commenting
0.32
matsu
0.31
سب
0.31
dorfer
0.30
NotExist
0.30
marily
0.30
Activations Density 0.008%