INDEX
    Explanations

    feminine nouns and suffixes

    New Auto-Interp
    Negative Logits
    I
    1.00
    0.61
    ı
    0.58
    нков
    0.57
    B
    0.57
    ಲೆಂ
    0.56
    я
    0.56
    0.56
    н
    0.55
    İ
    0.54
    POSITIVE LOGITS
     for
    0.55
    lar
    0.53
    dac
    0.53
    larımız
    0.51
    daki
    0.51
     આપી
    0.50
     anatomical
    0.49
    るので
    0.49
     fuer
    0.49
    របស់
    0.48
    Act Density 0.001%

    No Known Activations