INDEX
    Explanations

    common english phrases

    The neuron is detecting common high‐frequency function words (e.g. “and,” “of,” “for,” etc.).

    New Auto-Interp
    Negative Logits
     Fn
    -0.07
    agem
    -0.07
     remorse
    -0.06
     Harr
    -0.06
     MM
    -0.06
    _PA
    -0.06
     kas
    -0.06
     sins
    -0.06
    LT
    -0.06
    -0.06
    POSITIVE LOGITS
     همچنین
    0.07
     توسعه
    0.07
     Francie
    0.06
    []{"
    0.06
    истра
    0.06
    ъек
    0.06
    .AllowGet
    0.06
     меди
    0.06
     Obviously
    0.06
    .signIn
    0.06
    Act Density 0.325%

    No Known Activations