INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    y
    0.70
    on
    0.68
    ográfico
    0.67
     produz
    0.63
     χρήση
    0.62
     realizando
    0.59
    er
    0.59
    un
    0.59
    ستخدم
    0.57
    ó
    0.57
    POSITIVE LOGITS
     in
    0.72
     be
    0.65
    ?
    0.61
     a
    0.60
     our
    0.59
     sores
    0.59
    ۰
    0.59
    {
    0.58
    }{
    0.57
    צ
    0.57
    Act Density 0.138%

    No Known Activations