INDEX
Explanations
reference codes, patches, and buttons
New Auto-Interp
Negative Logits
@
0.79
@
0.79
Bob
0.78
Robert
0.78
Harold
0.76
<i>
0.73
David
0.73
Robert
0.72
Stanford
0.70
rather
0.70
POSITIVE LOGITS
btns
1.18
hitbox
1.03
btn
0.98
闥
0.98
adihi
0.98
riamo
0.94
ार्या
0.94
button
0.94
ванный
0.94
್ರೀ
0.94
Activations Density 0.001%