INDEX
    Explanations

    references to the word "them" in various contexts

    New Auto-Interp
    Negative Logits
     themſelves
    -0.96
     Efq
    -0.91
     ſeveral
    -0.90
     himſelf
    -0.88
     reaſon
    -0.88
    AndEndTag
    -0.87
     Họ
    -0.86
     bershka
    -0.85
     cauſe
    -0.85
     ſtate
    -0.85
    POSITIVE LOGITS
    تم
    0.60
     M
    0.59
    hm
    0.59
    ↵↵
    0.58
    0.57
    bs
    0.56
    0.56
    m
    0.56
    The
    0.54
    EV
    0.54
    Act Density 0.046%

    No Known Activations