INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     suffix
    -0.07
    αλ
    -0.06
    pu
    -0.06
    keepers
    -0.06
    jur
    -0.06
    -left
    -0.06
     TEMPLATE
    -0.06
     facile
    -0.06
    /black
    -0.06
     wives
    -0.06
    POSITIVE LOGITS
     extravag
    0.06
     hurricanes
    0.06
     abstract
    0.06
    به
    0.06
    _tbl
    0.06
     началь
    0.06
     dla
    0.06
     général
    0.06
     castle
    0.06
    ABL
    0.06
    Act Density 0.070%

    No Known Activations