INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.68
    0.66
     később
    0.65
     později
    0.63
    o
    0.60
    ad
    0.59
    0.59
    sin
    0.58
    the
    0.57
    ography
    0.57
    POSITIVE LOGITS
     can
    0.91
    ח
    0.89
    ع
    0.81
    0.79
    ال
    0.78
    א
    0.78
     is
    0.77
    માં
    0.77
    ح
    0.77
    0.76
    Act Density 0.020%

    No Known Activations