INDEX
    Explanations

    describes states or qualities

    New Auto-Interp
    Negative Logits
    КА
    0.88
    ০০
    0.73
     Faça
    0.73
    ‌ها
    0.72
    Ди
    0.71
    ম্ভীর
    0.71
    Dengan
    0.71
    RAFT
    0.70
    ხვევ
    0.70
    БА
    0.70
    POSITIVE LOGITS
    0.71
    ,
    0.69
     (
    0.65
    ik
    0.65
    ic
    0.64
    יים
    0.63
     deciding
    0.63
     warranted
    0.62
    ;
    0.61
     necesit
    0.60
    Act Density 0.320%

    No Known Activations