INDEX
Explanations
The neuron fires on the phrase “don’t need to” (and close variants), i.e. expressions indicating that something is not necessary.
New Auto-Interp
Negative Logits
Hoff
-0.08
Pregn
-0.07
loggedIn
-0.07
_PANEL
-0.07
�
-0.07
safeg
-0.07
avn
-0.07
προς
-0.07
Truthy
-0.07
μβρίου
-0.06
POSITIVE LOGITS
needing
0.07
0.06
alogy
0.06
0.05
/token
0.05
0.05
0.05
0.05
<'
0.05
{:?}",0.05
Activations Density 0.018%