INDEX
Explanations
demonstrate
This neuron activates on numeric score tokens (decimal numbers) representing model confidence or sentiment scores.
New Auto-Interp
Negative Logits
deletion
-0.08
yazar
-0.07
cheered
-0.07
functionalities
-0.07
stellt
-0.06
sit
-0.06
Bank
-0.06
Mate
-0.06
,",
-0.06
Laurie
-0.06
POSITIVE LOGITS
.GL
0.07
�
0.06
↵
0.06
أم
0.06
Le
0.06
Restart
0.06
čtvrt
0.06
currentPosition
0.06
toContain
0.06
?>">↵
0.06
Activations Density 0.038%