INDEX
    Explanations

    pronouns and associated actions

    New Auto-Interp
    Negative Logits
    um
    0.47
    animity
    0.43
    }$.
    0.43
     suunn
    0.42
    Z
    0.41
    uh
    0.40
    _
    0.40
    ap
    0.40
    BUS
    0.40
    aj
    0.40
    POSITIVE LOGITS
    мата
    0.42
    ścia
    0.42
    0.41
     взгля
    0.40
    իկ
    0.40
    кара
    0.40
    вании
    0.40
    нце
    0.40
    ка
    0.39
     karat
    0.39
    Act Density 0.169%

    No Known Activations