INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     tro
    -0.08
    older
    -0.08
    іг
    -0.08
     zwe
    -0.08
    .flag
    -0.07
     trac
    -0.07
     tran
    -0.07
     Jacobs
    -0.07
    -0.07
    .rows
    -0.07
    POSITIVE LOGITS
     долж
    0.09
     Mardi
    0.08
    ρυθ
    0.08
     Goes
    0.08
     eti
    0.08
     Demi
    0.08
     따라서
    0.07
    alini
    0.07
    0.07
     metr
    0.07
    Act Density 0.001%

    No Known Activations