INDEX
Explanations
references to scientific citations or bibliographic references
New Auto-Interp
Negative Logits
-0.77
l
-0.71
classnames
-0.66
Sk
-0.65
ms
-0.65
Si
-0.64
p
-0.63
f
-0.63
-0.63
р
-0.62
POSITIVE LOGITS
[@
1.36
[@
1.09
/@
0.92
:@
0.92
("@0.90
>@
0.87
="@
0.86
'@
0.85
('@0.85
=@
0.85
Activations Density 0.720%