INDEX
Explanations
names of individuals involved in political disputes
political parties and figures
New Auto-Interp
Negative Logits
<eos>
-0.30
,
-0.30
Con
-0.29
n
-0.29
.
-0.28
(
-0.28
“
-0.28
}
-0.27
len
-0.27
len
-0.27
POSITIVE LOGITS
propOrder
1.06
<unused16>
1.00
<unused74>
0.99
<unused28>
0.99
ValueStyle
0.99
Weiſe
0.99
<unused17>
0.99
<unused3>
0.99
[@BOS@]
0.99
<pad>
0.99
Activations Density 0.147%