INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ſelf
    -1.01
     صوتيه
    -0.97
    tvguidetime
    -0.94
    ſelves
    -0.93
     متعلقه
    -0.93
     ſche
    -0.92
     auffi
    -0.91
     Efq
    -0.91
     ſch
    -0.88
     faſt
    -0.88
    POSITIVE LOGITS
    ized
    0.60
     no
    0.56
    ist
    0.54
    0.47
     is
    0.45
     bad
    0.44
    ists
    0.44
     ur
    0.42
    ,
    0.41
     do
    0.41
    Act Density 0.076%

    No Known Activations