INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     '
    -3.47
     was
    -3.17
    ar
    -2.88
    d
    -2.88
     has
    -2.81
     *
    -2.56
     "
    -2.53
     You
    -2.42
    a
    -2.38
     )
    -2.34
    POSITIVE LOGITS
     diki
    2.67
    2.61
     kollu
    2.56
    2.56
    selben
    2.50
    2.50
     清新
    2.42
     kopling
    2.39
    2.39
     triko
    2.36
    Act Density 0.007%

    No Known Activations