INDEX
Explanations
something profound, unexpected, or more
New Auto-Interp
Negative Logits
annoying
0.34
پرت
0.33
boring
0.31
funny
0.31
उदा
0.30
cute
0.29
ditth
0.29
obnoxious
0.29
intimidating
0.29
pesky
0.28
POSITIVE LOGITS
bigger
0.38
others
0.36
greater
0.35
greater
0.35
bigger
0.34
ധികം
0.33
Others
0.33
beyond
0.32
extraordinary
0.32
nobody
0.31
Activations Density 0.017%