INDEX
    Explanations

    The neuron fires on modal/hedging words that express possibility or uncertainty (e.g. “possible,” “could,” “may,” “might”).

    New Auto-Interp
    Negative Logits
     clos
    -0.07
     However
    -0.06
     kw
    -0.06
     bump
    -0.06
     Manage
    -0.06
    However
    -0.06
    sense
    -0.06
     Ln
    -0.06
    _km
    -0.06
     tourism
    -0.06
    POSITIVE LOGITS
     zih
    0.07
    Their
    0.07
    /archive
    0.07
     الوطني
    0.07
    Що
    0.07
    0.07
    0.07
    ?s
    0.07
    ظٹط
    0.06
     профилакти
    0.06
    Act Density 0.026%

    No Known Activations