INDEX
    Explanations

    repeated symbols or phrases in various languages

    New Auto-Interp
    Negative Logits
     herself
    -0.84
     थी
    -0.65
     ihre
    -0.64
     peggio
    -0.63
     ihrer
    -0.63
     Aphrodite
    -0.63
     kterou
    -0.61
     która
    -0.57
    rairie
    -0.57
    dnn
    -0.56
    POSITIVE LOGITS
     himself
    1.33
    himself
    1.17
     Himself
    1.06
     boyhood
    0.85
     koji
    0.82
    his
    0.82
     his
    0.81
     који
    0.81
    rungsseite
    0.76
     który
    0.73
    Act Density 0.182%

    No Known Activations