INDEX
    Explanations

    text related to articles or programming instructions

    the presence of articles and other grammatical elements in text

    New Auto-Interp
    Negative Logits
    ''.
    -0.92
    ``
    -0.86
    .}
    -0.78
    "},
    -0.74
    cffff
    -0.72
    .''.
    -0.71
    });
    -0.71
    .''
    -0.70
     mathemat
    -0.69
    .",
    -0.69
    POSITIVE LOGITS
    1.48
     -
    1.25
     --
    1.14
    1.10
    1.06
    ãĥ»
    0.92
     âĢķ
    0.88
    0.88
    --
    0.84
    )—
    0.78
    Act Density 0.206%

    No Known Activations