INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    1.15
    י
    1.02
    z
    1.00
    ம்
    0.96
    נ
    0.93
    ק
    0.89
    ವಾಗಿ
    0.85
    ς
    0.82
     as
    0.78
    تم
    0.77
    POSITIVE LOGITS
     you
    0.98
     
    0.98
    -
    0.88
    SE
    0.80
    0
    0.79
    /
    0.75
    "
    0.75
    0.75
    ate
    0.74
     näytt
    0.74
    Act Density 0.111%

    No Known Activations