INDEX
    Explanations

    cause and effect

    The neuron fires on phrases that describe problems, risks, or negative outcomes—i.e. mentions of associated impacts, consequences, or severity in a research‐style context.

    New Auto-Interp
    Negative Logits
     bekan
    -0.07
     bew
    -0.07
    Interior
    -0.06
     benchmark
    -0.06
    /documentation
    -0.06
    :x
    -0.06
     Janet
    -0.06
     Required
    -0.06
     кв
    -0.06
    ropical
    -0.06
    POSITIVE LOGITS
    ...");↵↵
    0.07
    ridor
    0.07
    ुए
    0.07
     Increased
    0.07
     [])↵↵
    0.07
     ();↵
    0.06
    PING
    0.06
     Friend
    0.06
     ediyor
    0.06
     alıyor
    0.06
    Act Density 0.080%

    No Known Activations