INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    bara
    -0.07
    DIM
    -0.07
    altet
    -0.07
     Kolkata
    -0.06
    .dat
    -0.06
    library
    -0.06
     brib
    -0.06
    .dc
    -0.06
     nitelik
    -0.06
    なん
    -0.06
    POSITIVE LOGITS
    _Static
    0.07
     chall
    0.06
     zby
    0.06
    juries
    0.06
    endant
    0.06
    лина
    0.06
     stud
    0.06
    0.06
     виход
    0.06
     생산
    0.06
    Act Density 0.003%

    No Known Activations