INDEX
    Explanations

    references to individuals and groups in a variety of contexts

    New Auto-Interp
    Negative Logits
     itself
    -0.18
    taire
    -0.18
    cliffe
    -0.18
    onaut
    -0.17
    imary
    -0.16
    liš
    -0.15
    ress
    -0.15
    resse
    -0.15
    arah
    -0.15
    ï
    -0.15
    POSITIVE LOGITS
    /us
    0.35
    /her
    0.34
    zelf
    0.28
    -même
    0.24
    SELF
    0.23
    /th
    0.23
    atically
    0.22
    self
    0.21
    /we
    0.21
    etics
    0.19
    Act Density 0.150%

    No Known Activations