INDEX
    Explanations

    phrases related to specific items or objects

    definite articles or determiners in various contexts

    New Auto-Interp
    Negative Logits
    FILE
    -0.85
    AGE
    -0.76
    ade
    -0.73
    Background
    -0.72
    According
    -0.72
    ees
    -0.72
    Episode
    -0.72
    aran
    -0.72
     beforehand
    -0.71
    Britain
    -0.70
    POSITIVE LOGITS
     occasional
    1.04
     dreaded
    1.00
     ones
    0.96
     slightest
    0.91
     obligatory
    0.88
     aforementioned
    0.86
     downright
    0.85
     endless
    0.82
     smallest
    0.82
     latter
    0.81
    Act Density 0.238%

    No Known Activations