INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    cap
    0.67
     വെ
    0.63
     Blog
    0.62
    0.62
     powered
    0.60
    グレ
    0.58
     alike
    0.58
    ctus
    0.58
    лын
    0.58
     vem
    0.57
    POSITIVE LOGITS
     ў
    1.00
    Ў
    0.98
     бў
    0.97
     yoki
    0.93
    ў
    0.92
     Ў
    0.90
     bosh
    0.86
     қў
    0.86
     uchun
    0.86
     ўз
    0.85
    Act Density 0.037%

    No Known Activations