INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     willingly
    -0.07
     dose
    -0.07
     complication
    -0.06
    gregar
    -0.06
    textField
    -0.06
    hetto
    -0.06
    "},
    ↵
    -0.06
     sidl
    -0.06
     Ты
    -0.06
    ți
    -0.06
    POSITIVE LOGITS
    ibur
    0.07
     francouz
    0.07
    [item
    0.07
     Louisville
    0.06
    orget
    0.06
     fase
    0.06
    (REG
    0.06
    _feat
    0.06
    546
    0.06
     King
    0.06
    Act Density 0.000%

    No Known Activations