INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    	mask
    -0.08
     politician
    -0.07
     franchises
    -0.07
     visitor
    -0.07
     Plain
    -0.06
     resolutions
    -0.06
     Census
    -0.06
     imperative
    -0.06
     protections
    -0.06
    separator
    -0.06
    POSITIVE LOGITS
    ,right
    0.07
     versa
    0.07
     завод
    0.07
    0.06
    _;
    0.06
    kový
    0.06
    _contr
    0.06
    را
    0.06
     때문에
    0.06
    ,unsigned
    0.06
    Act Density 0.027%

    No Known Activations