INDEX
    Explanations

    references to conditions and relationships involving entities and attributes

    New Auto-Interp
    Negative Logits
     ITS
    -0.56
    Its
    -0.47
    ITS
    -0.47
     Its
    -0.47
    OWA
    -0.45
    its
    -0.44
     холо
    -0.42
    dl
    -0.41
     Fils
    -0.41
    اش
    -0.40
    POSITIVE LOGITS
    themselves
    1.24
     themselves
    1.24
     yourselves
    0.92
     their
    0.91
     they
    0.89
    Their
    0.89
    their
    0.85
     Their
    0.83
     którzy
    0.81
    彼らの
    0.81
    Act Density 0.746%

    No Known Activations