INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     facile
    -0.07
    OLUM
    -0.07
     COURT
    -0.06
     Buckley
    -0.06
     coy
    -0.06
    _nick
    -0.06
     vul
    -0.06
    !';↵
    -0.06
    と言
    -0.06
    -0.06
    POSITIVE LOGITS
    .getParent
    0.08
     wardrobe
    0.07
    barang
    0.07
    harga
    0.07
     Ion
    0.06
     trọng
    0.06
    重伤
    0.06
    ipline
    0.06
    guna
    0.06
     Sơn
    0.06
    Act Density 0.013%

    No Known Activations