INDEX
    Explanations

    something happened or changed

    New Auto-Interp
    Negative Logits
    ที่คุณ
    0.40
    0.39
     पाहून
    0.38
     που
    0.37
    送到
    0.36
    0.35
    Loc
    0.35
     имају
    0.34
    你能
    0.34
     ಕಂಡ
    0.34
    POSITIVE LOGITS
     shifts
    0.64
     shifted
    0.60
     prevents
    0.59
     compels
    0.58
     feels
    0.55
     interferes
    0.50
     distinguishes
    0.50
     Shifts
    0.50
     inhibits
    0.49
     seems
    0.49
    Act Density 0.005%

    No Known Activations