INDEX
    Explanations

    The neuron fires on editorial/discourse phrases that introduce simplifying assumptions (e.g. “for simplicity… we assume…”).

    New Auto-Interp
    Negative Logits
    Dados
    -0.07
     věcí
    -0.07
    ENTIAL
    -0.06
    Dup
    -0.06
    -0.06
    ичного
    -0.06
     vzdělávání
    -0.06
    -to
    -0.06
    धर
    -0.06
    -0.06
    POSITIVE LOGITS
     друз
    0.07
    artin
    0.07
     miles
    0.07
    ".$_
    0.06
     &,
    0.06
     Sakura
    0.06
    recent
    0.06
    (samples
    0.06
    Expl
    0.06
    0.06
    Act Density 0.017%

    No Known Activations