INDEX
    Explanations

    listing specific examples or categories

    New Auto-Interp
    Negative Logits
    some
    1.08
    both
    1.00
    各种
    0.97
    those
    0.97
     some
    0.94
    Some
    0.93
    Both
    0.91
    something
    0.90
     Some
    0.90
     några
    0.87
    POSITIVE LOGITS
     people
    1.03
     instances
    0.99
     of
    0.88
     important
    0.87
     things
    0.87
     aspects
    0.87
     notable
    0.79
     factors
    0.79
     places
    0.78
     other
    0.78
    Act Density 0.406%

    No Known Activations