INDEX
    Explanations

    language learning and explanation

    New Auto-Interp
    Negative Logits
     
    0.32
     viêm
    0.26
     እና
    0.25
     recordó
    0.25
     น้ำ
    0.25
    आई
    0.25
     và
    0.25
     joyería
    0.24
     problemática
    0.24
     rượu
    0.24
    POSITIVE LOGITS
    ות
    0.44
    ون
    0.37
    0.34
    ва
    0.33
    ل
    0.31
     pronouns
    0.30
    ר
    0.30
    ת
    0.30
    il
    0.29
    ната
    0.29
    Act Density 0.222%

    No Known Activations