INDEX
    Explanations

    The neuron fires on the phrase “don’t need to” (and close variants), i.e. expressions indicating that something is not necessary.

    New Auto-Interp
    Negative Logits
     Hoff
    -0.08
     Pregn
    -0.07
     loggedIn
    -0.07
    _PANEL
    -0.07
    -0.07
     safeg
    -0.07
    avn
    -0.07
     προς
    -0.07
    Truthy
    -0.07
    μβρίου
    -0.06
    POSITIVE LOGITS
     needing
    0.07
       		
    0.06
    alogy
    0.06
    					    
    0.05
    /token
    0.05
        				
    0.05
    									
    0.05
    				      
    0.05
    <'
    0.05
     {:?}",
    0.05
    Act Density 0.018%

    No Known Activations