INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ースト
    -0.07
     gifted
    -0.07
     Wow
    -0.07
    ovic
    -0.07
    iPad
    -0.07
     bowed
    -0.07
    وده
    -0.07
    488
    -0.07
    244
    -0.06
     SDL
    -0.06
    POSITIVE LOGITS
     на
    0.18
     На
    0.13
    На
    0.12
     НА
    0.10
     на
    0.09
    на
    0.09
     numar
    0.08
    NA
    0.08
    omba
    0.08
    0.08
    Act Density 0.018%

    No Known Activations