INDEX
    Explanations

    This neuron detects mentions of mental and physical health descriptors (e.g. “mental,” “mentally,” “physically”).

    New Auto-Interp
    Negative Logits
     pop
    -0.06
     извест
    -0.06
    -0.06
    },{
    -0.06
    :"
    -0.06
     @"
    -0.06
    “All
    -0.06
     regularly
    -0.05
    pieces
    -0.05
     hammered
    -0.05
    POSITIVE LOGITS
    umar
    0.07
     faithful
    0.07
    ruc
    0.07
     ADV
    0.06
     Ups
    0.06
    Smarty
    0.06
    0.06
     νεφώσεις
    0.06
    дя
    0.06
     answered
    0.06
    Act Density 0.016%

    No Known Activations