INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Human
    -0.06
     شهید
    -0.06
    -0.06
     LOG
    -0.06
     Performance
    -0.06
     Nicht
    -0.06
     nat
    -0.06
     flirting
    -0.06
     Nowadays
    -0.06
    rezent
    -0.06
    POSITIVE LOGITS
    -full
    0.07
    Include
    0.07
    TC
    0.06
     tick
    0.06
    !",↵
    0.06
     shimmer
    0.06
    ترل
    0.06
     місці
    0.06
    टर
    0.06
     Laboratory
    0.06
    Act Density 0.022%

    No Known Activations