INDEX
Explanations
code/technical documents
The neuron fires on the little floating-point score token (e.g. “3.82…”/“3.84…”) that prefixes the model’s “Yes”/“No” answer.
New Auto-Interp
Negative Logits
FH
-0.08
Isle
-0.08
▍▍
-0.07
湿
-0.06
.tt
-0.06
Stefan
-0.06
_fft
-0.06
몰
-0.06
_VENDOR
-0.06
Vapor
-0.06
POSITIVE LOGITS
Remaining
0.07
да
0.07
obot
0.06
aes
0.06
carbohydrate
0.06
om
0.06
computation
0.06
versión
0.06
.reduce
0.06
expanded
0.06
Activations Density 0.001%