INDEX
Explanations
attends to tokens related to discrimination from tokens indicating specific categories of discrimination
New Auto-Interp
Head Attr Weights
0:0.11
1:0.15
2:0.12
3:0.11
4:0.11
5:0.04
6:0.12
7:0.19
Negative Logits
AddTagHelper
-0.40
PreferredItem
-0.38
addCriterion
-0.36
mappedBy
-0.36
InstrumentedTest
-0.34
FunctionFlags
-0.32
AssemblyTitle
-0.31
незавершена
-0.31
IsMutable
-0.30
lenker
-0.29
POSITIVE LOGITS
ๆ
0.32
Ruß
0.28
modb
0.28
ๆ
0.26
もしれない
0.26
참고
0.26
Kommission
0.26
aba
0.26
もしれません
0.25
СП
0.25
Activations Density 0.129%