INDEX
    Explanations

    The neuron activates on occurrences of the word “safe,” i.e. it flags mentions of something being safe.

    New Auto-Interp
    Negative Logits
     unreachable
    -0.07
    .isPresent
    -0.07
     вб
    -0.06
     chipset
    -0.06
     nale
    -0.06
     نسمة
    -0.06
    ponential
    -0.06
    -0.06
    nvarchar
    -0.06
    EmptyEntries
    -0.06
    POSITIVE LOGITS
     safe
    0.07
    ayi
    0.07
     risk
    0.07
     risky
    0.07
     presidente
    0.06
     blindly
    0.06
     Volk
    0.06
    тор
    0.06
     Risk
    0.06
     Georgia
    0.06
    Act Density 0.016%

    No Known Activations