INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    د
    1.89
    ب
    1.61
    ב
    1.56
    д
    1.55
    ح
    1.46
    ่า
    1.43
    ق
    1.39
    ف
    1.35
    л
    1.24
    1.20
    POSITIVE LOGITS
    -
    1.17
    ib
    1.09
     αλλά
    1.02
    _
    0.98
    ill
    0.96
    ри
    0.95
     και
    0.95
     dhe
    0.95
     все
    0.94
    enting
    0.86
    Act Density 0.001%

    No Known Activations