INDEX
    Explanations

    female, character descriptions

    New Auto-Interp
    Negative Logits
    ה
    1.57
    מ
    1.54
    ن
    1.51
    ない
    1.38
    א
    1.36
    м
    1.34
    ları
    1.33
    padă
    1.32
    ي
    1.31
    n
    1.30
    POSITIVE LOGITS
    elling
    1.20
     
    1.19
    het
    1.05
    )
    1.04
     for
    0.96
     male
    0.94
    ess
    0.94
    ized
    0.94
    ash
    0.93
    age
    0.91
    Act Density 0.023%

    No Known Activations