INDEX
Explanations
technical formatting and code
New Auto-Interp
Negative Logits
<unused544>
0.58
squarePos
0.55
<unused609>
0.55
𝚋
0.54
<unused291>
0.52
<unused301>
0.52
<unused657>
0.52
𒋼
0.52
<unused192>
0.52
<unused2013>
0.51
POSITIVE LOGITS
0.58
guide
0.51
medical
0.48
light
0.48
,
0.47
power
0.46
continuum
0.45
/
0.45
wilderness
0.44
history
0.44
Activations Density 0.000%