INDEX
Explanations
patterns of curly braces or brackets
New Auto-Interp
Negative Logits
ness
-0.80
—
-0.73
ers
-0.72
er
-0.69
(
-0.68
ment
-0.66
<sup>
-0.66
an
-0.66
ating
-0.65
ings
-0.64
POSITIVE LOGITS
"]}
1.43
"}
1.43
']}
1.39
}}$}
1.39
")}
1.36
]")]
1.36
'}
1.34
).}
1.21
.)}
1.21
')}
1.20
Activations Density 0.288%