INDEX
Explanations
assertive statements and claims about competence and capability
New Auto-Interp
Negative Logits
ØŃت
-0.15
alm
-0.14
orda
-0.14
onet
-0.14
ึà¹ī
-0.14
/mit
-0.14
arella
-0.13
Fcn
-0.13
Ã
-0.13
ropp
-0.13
POSITIVE LOGITS
right
1.05
correct
0.92
right
0.89
RIGHT
0.81
Right
0.76
Right
0.74
-right
0.74
_right
0.73
correct
0.71
wrong
0.67
Activations Density 0.322%