INDEX
    Explanations

    until followed by a noun or pronoun

    New Auto-Interp
    Negative Logits
    ı
    1.25
    ená
    1.23
    1.23
    dır
    1.20
     וא
    1.17
    larında
    1.14
    ın
    1.13
    kaan
    1.13
    ıyla
    1.13
     اوقات
    1.13
    POSITIVE LOGITS
    ছেন
    1.19
    1.19
    و
    1.16
    ון
    0.98
    ),
    0.94
    ح
    0.91
    ס
    0.91
    с
    0.90
    Основ
    0.89
    ির
    0.87
    Act Density 0.179%

    No Known Activations