INDEX
    Explanations

    the presence of specific structured phrases or articles, particularly "a" and "an"

    New Auto-Interp
    Negative Logits
    OKIE
    -0.17
    owo
    -0.16
    ãĥ³ãĥĨãĤ£
    -0.14
    StackNavigator
    -0.14
    lla
    -0.14
    lle
    -0.14
    ληÏĤ
    -0.14
    CKER
    -0.13
    ssa
    -0.13
    adio
    -0.13
    POSITIVE LOGITS
    .gwt
    0.17
     recent
    0.16
    edes
    0.16
    Feat
    0.16
     aim
    0.15
    ecs
    0.14
    tera
    0.14
     majority
    0.14
    ree
    0.14
     failure
    0.14
    Act Density 0.112%

    No Known Activations