INDEX
    Explanations

    appropriate

    This neuron detects the token “appropriate.”

    New Auto-Interp
    Negative Logits
    -0.08
     totals
    -0.07
    256
    -0.07
     Neuroscience
    -0.07
     &=
    -0.07
    985
    -0.07
    -0.07
     Sanders
    -0.07
     tests
    -0.07
    880
    -0.06
    POSITIVE LOGITS
     appropriate
    0.16
     appropriately
    0.12
     inappropriate
    0.11
    appropriate
    0.11
    ighet
    0.09
    ropriate
    0.09
    	    		
    0.08
    APT
    0.08
    opped
    0.08
     οπο
    0.07
    Act Density 0.020%

    No Known Activations