INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    '
    0.83
    و
    0.82
    ون
    0.68
    0.63
    quele
    0.61
    ла
    0.59
    0.56
     económ
    0.54
    وون
    0.54
    1
    0.54
    POSITIVE LOGITS
     
    1.02
     a
    0.97
     is
    0.91
     t
    0.91
     i
    0.83
     was
    0.81
     an
    0.79
     of
    0.77
     the
    0.72
     s
    0.69
    Act Density 0.000%

    No Known Activations