INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ��
    -0.06
     evaluation
    -0.06
    									 
    -0.06
     approached
    -0.06
     criticism
    -0.06
     mistakenly
    -0.06
     mo
    -0.06
    _title
    -0.06
     praise
    -0.06
     unresolved
    -0.06
    POSITIVE LOGITS
     CSR
    0.07
    Reverse
    0.07
    仿
    0.07
     Derby
    0.07
    Geom
    0.07
     Kup
    0.07
    igma
    0.07
     مرکزی
    0.07
     Ability
    0.07
     ΑΝ
    0.06
    Act Density 0.002%

    No Known Activations