INDEX
Explanations
attends to the latest closing double asterisk token from a series of tokens that appear later in the sentence
New Auto-Interp
Head Attr Weights
0:0.04
1:0.06
2:0.07
3:0.55
4:0.07
5:0.07
6:0.06
7:0.04
Negative Logits
MLLoader
-0.40
Италијани
-0.37
disambiguazione
-0.36
Rohy
-0.35
otomatig
-0.33
"..\..\..\
-0.33
Autoritní
-0.32
حياتها
-0.32
Earle
-0.31
урна
-0.31
POSITIVE LOGITS
CURIAM
0.43
Rptr
0.37
umptive
0.37
verwijspagina
0.36
Imo
0.33
ggiare
0.32
Xie
0.32
/*---
0.31
RegressionTest
0.31
viedo
0.30
Activations Density 0.012%