INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.77
    ل
    0.58
    ти
    0.42
    αν
    0.41
    l
    0.40
    ме
    0.40
    ל
    0.39
    л
    0.37
    ות
    0.37
    ين
    0.36
    POSITIVE LOGITS
     be
    0.52
     to
    0.39
     t
    0.38
    {
    0.33
     was
    0.33
     ت
    0.33
    را
    0.33
     at
    0.31
     are
    0.31
     it
    0.30
    Act Density 5.396%

    No Known Activations