INDEX
    Explanations

    instances of the phrase "a" followed by various nouns or descriptors

    New Auto-Interp
    Negative Logits
    ĽĪ
    -0.15
    ffa
    -0.15
    شت
    -0.15
    pires
    -0.15
    TestCategory
    -0.15
     dán
    -0.14
    velt
    -0.14
     addCriterion
    -0.14
    pus
    -0.14
    olas
    -0.14
    POSITIVE LOGITS
     beating
    0.29
     liking
    0.29
     cue
    0.29
     step
    0.26
     cues
    0.25
     stance
    0.25
     leap
    0.24
     toll
    0.23
     look
    0.23
     stab
    0.23
    Act Density 0.057%

    No Known Activations