INDEX
Explanations
The neuron activates on the decimal numeric probability values (floating‐point numbers) in the text.
New Auto-Interp
Negative Logits
avi
-0.07
Odd
-0.07
iasm
-0.07
zo
-0.06
davidjl
-0.06
обав
-0.06
Sanford
-0.06
AIN
-0.06
цями
-0.06
зд
-0.06
POSITIVE LOGITS
_DAYS
0.07
INDEX
0.07
ffff
0.06
WebSocket
0.06
_self
0.06
predecess
0.06
(commit
0.06
Inside
0.06
перест
0.06
“我
0.06
Activations Density 0.001%