INDEX
    Explanations

    numerical data and references to academic papers or studies

    Citations with volume and page numbers

    New Auto-Interp
    Negative Logits
     themselves
    -0.82
    Their
    -0.77
     their
    -0.77
     Their
    -0.77
     yourselves
    -0.76
     collectively
    -0.71
    themselves
    -0.71
    their
    -0.71
     eds
    -0.71
     deres
    -0.71
    POSITIVE LOGITS
     himself
    0.87
    himself
    0.70
    一人で
    0.64
     his
    0.61
     himſelf
    0.60
     själv
    0.60
     herself
    0.60
     alone
    0.57
     OMITBAD
    0.55
    IEWS
    0.55
    Act Density 0.127%

    No Known Activations