INDEX
    Explanations

    The neuron activates on occurrences of the word “racist” (and closely related forms like “racism”).

    New Auto-Interp
    Negative Logits
    hdl
    -0.07
     WAL
    -0.07
     Holl
    -0.06
     stranded
    -0.06
    bol
    -0.06
    _patches
    -0.06
     всп
    -0.06
    ateral
    -0.06
    	getline
    -0.06
    CLA
    -0.06
    POSITIVE LOGITS
     racism
    0.10
     racist
    0.10
    きな
    0.07
     reason
    0.07
    
    0.06
    ़ों
    0.06
     endorsed
    0.06
     carpet
    0.06
    irector
    0.06
     opening
    0.06
    Act Density 0.005%

    No Known Activations