INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pleaſure
    -0.78
     contienen
    -0.68
     contained
    -0.67
    berdayakan
    -0.67
     habet
    -0.66
     itſelf
    -0.65
     contain
    -0.65
    contain
    -0.64
     Efq
    -0.62
     SafeMath
    -0.62
    POSITIVE LOGITS
    0.84
     tartalomajánló
    0.57
     about
    0.56
    about
    0.54
    windowFixed
    0.53
    <bos>
    0.52
     the
    0.52
     respeito
    0.51
    0.50
     over
    0.50
    Act Density 0.080%

    No Known Activations