INDEX
Explanations
stringent requirements, comprehensive choice
New Auto-Interp
Negative Logits
و
0.97
E
0.96
C
0.95
ATION
0.88
n
0.88
IVITY
0.87
ната
0.87
이니
0.85
ن
0.85
R
0.83
POSITIVE LOGITS
.${1.05
for
1.02
\".
0.98
.}$
0.95
.\"
0.94
.";
0.92
.].
0.91
.
0.91
.。
0.89
.
0.89
Activations Density 0.502%