INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ك
    1.92
    니다
    1.55
     Örneğin
    1.44
     Информация
    1.40
    le
    1.36
    daki
    1.34
    Следу
    1.34
    Օ
    1.33
    كار
    1.30
     Según
    1.30
    POSITIVE LOGITS
    ان
    1.34
     giày
    1.31
    ic
    1.27
     catech
    1.25
     cupcakes
    1.23
     groceries
    1.22
     transgress
    1.22
    𝒈
    1.21
    1.18
     брен
    1.16
    Act Density 0.084%

    No Known Activations