INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     educational
    -0.07
    %↵
    -0.07
    Congress
    -0.07
     careg
    -0.07
    (Op
    -0.06
    (primary
    -0.06
    -learning
    -0.06
    medical
    -0.06
     Tổng
    -0.06
     carve
    -0.06
    POSITIVE LOGITS
    ropolitan
    0.06
     Laf
    0.06
     cwd
    0.06
    <dyn
    0.06
    zier
    0.06
     Tmax
    0.06
     oa
    0.06
     nutshell
    0.06
     Blackburn
    0.06
     insulting
    0.06
    Act Density 0.004%

    No Known Activations