INDEX
Explanations
attends to various tokens that signify types, specifically focusing on specific pairs of tokens related to the law or events
New Auto-Interp
Head Attr Weights
0:0.15
1:0.07
2:0.51
3:0.03
4:0.03
5:0.02
6:0.09
7:0.06
Negative Logits
Charm
-0.32
PANY
-0.32
NewUrlParser
-0.31
CreateTagHelper
-0.30
Portail
-0.29
organizzazione
-0.29
벳
-0.29
Gier
-0.29
valently
-0.29
Hunter
-0.28
POSITIVE LOGITS
TintMode
0.32
clusal
0.31
TextAppearance
0.30
CWE
0.30
السكان
0.29
ItemBackground
0.29
Rhestr
0.29
*);
0.29
}));
0.29
];
0.28
Activations Density 0.057%