INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (dist
    -0.08
    CATEGORY
    -0.07
    BOUND
    -0.07
    _required
    -0.06
     complications
    -0.06
     porch
    -0.06
     crossword
    -0.06
     dalla
    -0.06
     месте
    -0.06
     constructive
    -0.06
    POSITIVE LOGITS
     naive
    0.16
     naï
    0.13
     paranoia
    0.07
     Bere
    0.07
     ignorance
    0.07
     paranoid
    0.07
     모르
    0.07
    0.06
     Naomi
    0.06
     taxpayers
    0.06
    Act Density 0.002%

    No Known Activations