INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     فريبيس
    -0.63
    tagPool
    -0.52
    arschijnlijk
    -0.51
    ftagPool
    -0.48
     تضيفلها
    -0.47
     inconn
    -0.46
    ніципалі
    -0.46
    wahati
    -0.46
     ſhould
    -0.45
    StructEnd
    -0.45
    POSITIVE LOGITS
    aya
    3.09
    AYA
    2.55
    ayas
    1.67
    haya
    1.38
    ayah
    1.37
    ayan
    1.34
    Aya
    1.30
    aye
    1.23
    raya
    1.23
    ayam
    1.20
    Act Density 0.004%

    No Known Activations