INDEX
Explanations
curly braces and parentheses in the text
New Auto-Interp
Negative Logits
isson
-0.85
(
-0.74
o
-0.69
-
-0.69
ous
-0.69
n
-0.68
real
-0.67
sub
-0.67
<sup>
-0.66
ce
-0.66
POSITIVE LOGITS
]")]
1.66
}))
1.44
]})
1.41
'}
1.40
})}
1.39
)}
1.39
]}
1.38
))}
1.38
"}
1.36
.)}
1.34
Activations Density 0.228%