INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sitten
    -0.32
    -0.31
     fly
    -0.31
     podstaw
    -0.30
     Blatt
    -0.29
    abs
    -0.29
    ambilan
    -0.28
     would
    -0.28
     attempt
    -0.27
    vyk
    -0.27
    POSITIVE LOGITS
     thanks
    1.39
     graças
    1.32
    thanks
    1.30
    Благодаря
    1.28
     благодаря
    1.28
     grâce
    1.23
     gracias
    1.20
     THANKS
    1.20
     dzięki
    1.20
     Thanks
    1.20
    Act Density 0.157%

    No Known Activations