INDEX
    Explanations

    words indicating creation or production

    New Auto-Interp
    Negative Logits
    GIN
    -0.17
    puted
    -0.16
     же
    -0.16
    ugin
    -0.15
    inkle
    -0.14
    ayın
    -0.14
    wyn
    -0.14
    rome
    -0.14
     ateÅŁ
    -0.14
    rado
    -0.14
    POSITIVE LOGITS
     it
    0.20
    ALLED
    0.14
    #ad
    0.14
    zew
    0.14
     sense
    0.13
    enberg
    0.13
    iert
    0.13
    riad
    0.13
    amura
    0.13
    ossa
    0.13
    Act Density 0.067%

    No Known Activations