INDEX
    Explanations

    names followed by comma

    New Auto-Interp
    Negative Logits
     বহন
    0.38
    itate
    0.38
    ார்க்க
    0.38
     kto
    0.37
    ongyang
    0.36
    겠죠
    0.35
     utili
    0.35
    util
    0.35
    itution
    0.35
     aDecoder
    0.35
    POSITIVE LOGITS
     ơi
    0.61
    0.54
     thank
    0.53
     dear
    0.51
     তুমি
    0.50
     please
    0.48
    !
    0.47
     спасибо
    0.46
    0.45
    ရေ
    0.45
    Act Density 0.027%

    No Known Activations