INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    aat
    -0.15
    vince
    -0.14
    asmus
    -0.14
    iffies
    -0.14
     facts
    -0.14
    ingo
    -0.14
    ione
    -0.14
    ÃŃst
    -0.14
     знаком
    -0.13
     Elm
    -0.13
    POSITIVE LOGITS
    rosse
    0.32
    oste
    0.28
    quer
    0.27
    ustr
    0.20
    kiye
    0.19
    ritz
    0.19
    nung
    0.18
    uesta
    0.18
    ROS
    0.18
    unar
    0.17
    Act Density 0.008%

    No Known Activations