INDEX
    Explanations

    pronouns referring to a person

    New Auto-Interp
    Negative Logits
     for
    0.51
    我們
    0.48
     forgery
    0.47
    我們要
    0.47
    我们就
    0.47
    ି
    0.45
     ஒன்றை
    0.45
    কিছু
    0.44
     আমরা
    0.44
    ோம்
    0.44
    POSITIVE LOGITS
     wrote
    0.49
     flew
    0.46
    ffes
    0.46
     texted
    0.45
     could
    0.44
     wore
    0.44
    mming
    0.44
     can
    0.44
    0.43
    czy
    0.43
    Act Density 0.000%

    No Known Activations