INDEX
    Explanations

    the neuron detects words expressing admission or confession (e.g., “admitted,” “admits”).

    New Auto-Interp
    Negative Logits
     flood
    -0.07
     followed
    -0.06
    Fish
    -0.06
     shape
    -0.06
    _THREAD
    -0.06
     peel
    -0.06
    publish
    -0.06
    (seconds
    -0.06
     snake
    -0.06
     curtain
    -0.06
    POSITIVE LOGITS
     admitted
    0.10
     admits
    0.09
     admitting
    0.09
     admit
    0.09
     admittedly
    0.08
     confessed
    0.07
     confess
    0.07
     aos
    0.07
    ategorie
    0.07
    ório
    0.06
    Act Density 0.009%

    No Known Activations