INDEX
    Explanations

    mentions of different groups or numbers of items in a list

    key elements or metrics in contrasting situations

    New Auto-Interp
    Negative Logits
     GOODMAN
    -0.98
     Christy
    -0.70
     PLUS
    -0.69
     Zionism
    -0.67
     Brigham
    -0.64
     Franz
    -0.63
     Regist
    -0.62
     Clarks
    -0.62
     Redemption
    -0.62
     Wrong
    -0.61
    POSITIVE LOGITS
     others
    0.80
    Others
    0.79
    umber
    0.74
    empl
    0.70
    aughed
    0.69
    phthal
    0.67
    iliary
    0.66
     rest
    0.65
     twe
    0.64
    thro
    0.64
    Act Density 0.218%

    No Known Activations