INDEX
Explanations
attends to the last character in a token from a nearby token marked with specific labels
New Auto-Interp
Head Attr Weights
0:0.13
1:0.14
2:0.08
3:0.13
4:0.13
5:0.15
6:0.11
7:0.10
Negative Logits
ρώ
-0.37
Manbalar
-0.37
FIR
-0.34
foncé
-0.34
FIR
-0.34
pubblici
-0.34
européens
-0.33
дь
-0.33
OB
-0.33
ışık
-0.33
POSITIVE LOGITS
AddHtmlAttribute
0.37
expandindo
0.36
tonode
0.35
فريبيس
0.34
nestjs
0.33
Datuak
0.33
estimés
0.33
ParallelGroup
0.32
ErrIntOverflow
0.32
новниш
0.31
Activations Density 0.003%