INDEX
    Explanations

    phrases that repeatedly reference specific groups or individuals as "those."

    New Auto-Interp
    Negative Logits
    enstein
    -0.17
    ayne
    -0.15
    主人
    -0.14
    aptops
    -0.14
    those
    -0.14
     outset
    -0.14
    Ë
    -0.14
    à¥ģरस
    -0.14
    idan
    -0.13
    isode
    -0.13
    POSITIVE LOGITS
     who
    0.42
    who
    0.32
     whom
    0.29
     same
    0.29
    curity
    0.28
     Who
    0.26
    Who
    0.24
     kinds
    0.23
     اÙĦذÙĬÙĨ
    0.23
     whose
    0.22
    Act Density 0.060%

    No Known Activations