INDEX
Explanations
numerical data and references in academic or technical contexts
New Auto-Interp
Negative Logits
227
-0.17
233
-0.16
235
-0.15
Bias
-0.15
175
-0.15
274
-0.15
243
-0.15
bias
-0.15
154
-0.15
ike
-0.15
POSITIVE LOGITS
950
0.36
956
0.35
954
0.35
966
0.34
900
0.34
958
0.34
996
0.33
920
0.33
953
0.33
960
0.33
Activations Density 0.153%