INDEX
Explanations
monetary values or dollar amounts
percentages and numerical values
New Auto-Interp
Negative Logits
I
-0.39
E
-0.36
i
-0.34
↵
-0.33
-0.33
G
-0.32
N
-0.31
W
-0.31
B
-0.31
g
-0.31
POSITIVE LOGITS
<unused3>
1.28
<unused74>
1.28
<unused52>
1.28
<unused68>
1.28
<unused8>
1.28
[@BOS@]
1.28
<unused41>
1.28
<unused23>
1.27
<unused16>
1.27
<unused14>
1.27
Activations Density 0.004%