INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     is
    0.63
    ne
    0.61
    3
    0.57
    raz
    0.54
    .
    0.53
    j
    0.53
     out
    0.51
    ci
    0.51
    theorem
    0.50
    0.50
    POSITIVE LOGITS
     ROLE
    0.75
    ین
    0.71
     role
    0.63
     tradicion
    0.59
    Role
    0.58
     peran
    0.57
     निभाने
    0.56
    0.56
    を果た
    0.55
    ن
    0.54
    Act Density 0.063%

    No Known Activations