INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Maintain
    -0.78
     hic
    -0.76
     maintain
    -0.72
    Maintain
    -0.68
     دخترانه
    -0.65
     бар
    -0.63
     Did
    -0.63
     Add
    -0.63
    ığında
    -0.61
     记
    -0.61
    POSITIVE LOGITS
     NEWS
    0.74
    klärt
    0.71
    ilang
    0.70
     jede
    0.69
     historii
    0.68
    Instances
    0.67
     PLAYER
    0.67
     Coulter
    0.67
     Supper
    0.66
    UNK
    0.65
    Act Density 0.043%

    No Known Activations