INDEX
    Explanations

    pronouns that refer to groups of people, particularly in relation to actions or sentiments

    New Auto-Interp
    Negative Logits
    was
    -0.22
    (s
    -0.18
    (es
    -0.18
    Was
    -0.17
     itself
    -0.16
     Was
    -0.15
    amp
    -0.15
     باشد
    -0.14
    nbsp
    -0.14
    [s
    -0.13
    POSITIVE LOGITS
    ’re
    0.69
    're
    0.63
    ’ve
    0.52
     are
    0.52
    've
    0.51
    ’ll
    0.41
    'll
    0.39
     aren
    0.39
    ’d
    0.35
     were
    0.35
    Act Density 0.822%

    No Known Activations