INDEX
    Explanations

    Hypothetical situations

    The neuron fires on words and phrases that signal hypothetical risk or potential negative outcomes (e.g. “could,” “theoretically,” “detrimental,” “dangerous”).

    New Auto-Interp
    Negative Logits
     Sherlock
    -0.07
    обще
    -0.07
     Labels
    -0.07
     Reconstruction
    -0.07
     specifications
    -0.07
    character
    -0.07
    xford
    -0.06
    ={$
    -0.06
     explanation
    -0.06
    _selection
    -0.06
    POSITIVE LOGITS
     вий
    0.06
     kil
    0.06
     Async
    0.06
    ΑΣ
    0.06
    Matrix
    0.06
     nej
    0.06
     genesis
    0.06
    _Event
    0.06
     amigos
    0.06
    Mes
    0.05
    Act Density 0.106%

    No Known Activations