INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     overlook
    -1.08
    ValueStyle
    -0.81
     overlooking
    -0.80
     overlooks
    -0.79
     متعلقه
    -0.72
     descu
    -0.57
    aspectj
    -0.57
     neglect
    -0.56
     disregard
    -0.56
    WriteBarrier
    -0.53
    POSITIVE LOGITS
     the
    0.82
     something
    0.69
     a
    0.62
    FTFY
    0.61
    <bos>
    0.60
    ướng
    0.58
    OLDS
    0.57
    0.57
    )$_
    0.57
    mbols
    0.57
    Act Density 0.035%

    No Known Activations