INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     abbre
    -0.08
     progressing
    -0.08
     прин
    -0.08
     tales
    -0.08
    -0.07
     Fate
    -0.07
     Chen
    -0.07
    yet
    -0.07
     questioned
    -0.07
    ulde
    -0.07
    POSITIVE LOGITS
     concession
    0.08
     concessions
    0.07
    0.07
     mur
    0.07
     equil
    0.07
     males
    0.07
    product
    0.07
     estim
    0.07
    rad
    0.07
    _param
    0.07
    Act Density 0.003%

    No Known Activations