INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     فريبيس
    -0.63
     anyone
    -0.54
     saites
    -0.53
     whatsoever
    -0.52
     anybody
    -0.51
    anyone
    -0.51
     Anyone
    -0.48
    <bos>
    -0.48
     استنادى
    -0.47
    Anyone
    -0.46
    POSITIVE LOGITS
     so
    0.64
     refirió
    0.55
    +#+#
    0.55
    béco
    0.53
    bilidad
    0.53
     GenerationType
    0.51
    upol
    0.51
     very
    0.51
    weeted
    0.49
    qiao
    0.49
    Act Density 0.002%

    No Known Activations