INDEX
    Explanations

    the apostrophe character in English contractions.

    This neuron detects mentions of large language models and related training processes.

    New Auto-Interp
    Negative Logits
     
    0.47
     is
    0.43
     a
    0.42
     (
    0.35
    0.31
     {
    0.30
     it
    0.29
     to
    0.29
    ،
    0.29
     of
    0.28
    POSITIVE LOGITS
    and
    0.45
    на
    0.45
    ون
    0.44
    z
    0.40
    u
    0.38
    f
    0.37
    in
    0.36
    ل
    0.36
    b
    0.35
    d
    0.34
    Act Density 16.856%

    No Known Activations