INDEX
    Explanations

    The neuron fires on words that express apprehension or risk (e.g. “risk,” “afraid,” etc.).

    New Auto-Interp
    Negative Logits
     exem
    -0.07
    něl
    -0.07
     spiked
    -0.06
     Free
    -0.06
     seize
    -0.06
    (sz
    -0.06
     rejects
    -0.06
    safe
    -0.06
     nailed
    -0.06
     denounced
    -0.06
    POSITIVE LOGITS
    convertView
    0.07
     Corinth
    0.06
    /l
    0.06
    :d
    0.06
    ={},
    0.06
    	of
    0.06
    izzie
    0.06
    .qml
    0.06
     unpleasant
    0.06
    JI
    0.06
    Act Density 0.013%

    No Known Activations