INDEX
Explanations
attends to tokens indicating determination from relevant tokens indicating derivation or cause
New Auto-Interp
Head Attr Weights
0:0.06
1:0.10
2:0.07
3:0.13
4:0.44
5:0.04
6:0.06
7:0.06
Negative Logits
ones
-0.23
_
-0.23
lite
-0.21
able
-0.20
k
-0.20
y
-0.19
kabul
-0.19
7
-0.19
ơi
-0.18
halb
-0.18
POSITIVE LOGITS
oredCriteria
0.60
تقاوى
0.54
()]);
0.53
'])->
0.53
']);
0.52
}));
0.50
betweenstory
0.50
")));
0.50
})));
0.49
'])){
0.48
Activations Density 0.633%