INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .Typed
    -0.08
    Win
    -0.08
    Rx
    -0.07
     تلف
    -0.07
    uu
    -0.06
    PropTypes
    -0.06
     nettsteder
    -0.06
    [max
    -0.06
    larındaki
    -0.06
    	max
    -0.06
    POSITIVE LOGITS
    179
    0.07
    178
    0.07
    176
    0.07
     delegated
    0.06
    180
    0.06
    bo
    0.06
    182
    0.06
     Jefferson
    0.06
     adhere
    0.06
    181
    0.06
    Act Density 0.020%

    No Known Activations