INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -
    1.00
    p
    0.92
    ak
    0.91
    al
    0.88
    i
    0.88
    in
    0.87
    u
    0.87
    ie
    0.79
    ach
    0.76
    ia
    0.74
    POSITIVE LOGITS
     aprecia
    0.79
    0.78
     sıcak
    0.74
     Спо
    0.72
     Erste
    0.71
    ן
    0.68
    5
    0.68
     poświę
    0.67
    <unused2199>
    0.65
     scoprire
    0.64
    Act Density 0.002%

    No Known Activations