INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     大きい
    -1.05
     for
    -1.01
    жную
    -0.98
     here
    -0.98
    tetur
    -0.95
    triangleq
    -0.94
     zelfs
    -0.90
    也能
    -0.90
     spu
    -0.89
    tamina
    -0.89
    POSITIVE LOGITS
     words
    1.09
     hjemmeside
    1.01
     ਦਾ
    1.01
     sounded
    1.00
    0.95
    你是不是
    0.93
    omato
    0.93
    当時
    0.93
    ratory
    0.92
    しかし
    0.91
    Act Density 0.004%

    No Known Activations