INDEX
Explanations
discussion of mistakes or errors
references to mistakes and errors
New Auto-Interp
Negative Logits
population
-0.78
iture
-0.70
minent
-0.68
region
-0.67
uction
-0.67
amen
-0.67
orthy
-0.65
ighth
-0.65
otor
-0.64
metry
-0.64
POSITIVE LOGITS
mistakes
1.27
dece
0.88
errors
0.87
é»Ĵ
0.83
flaws
0.81
behavi
0.81
mistake
0.81
Malf
0.75
uggest
0.75
glitches
0.74
Activations Density 0.015%