INDEX
    Explanations

    phrases indicating a specific quantity of items

    repeated phrases that start with "of."

    New Auto-Interp
    Negative Logits
     surpr
    -0.70
     ende
    -0.68
    blance
    -0.67
     disadvant
    -0.64
     condem
    -0.64
     agre
    -0.63
     lapt
    -0.63
     awa
    -0.63
     rewriting
    -0.58
    itute
    -0.58
    POSITIVE LOGITS
    ses
    0.72
     these
    0.69
     them
    0.67
     us
    0.67
     course
    0.65
    Adams
    0.64
    them
    0.64
    icial
    0.63
    these
    0.63
     whom
    0.63
    Act Density 0.099%

    No Known Activations