INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ంగా
    1.16
    s
    1.09
    ف
    1.05
    il
    1.03
    ों
    0.97
    ו
    0.96
    т
    0.96
    ap
    0.93
    0.91
    ب
    0.91
    POSITIVE LOGITS
    </h2>
    1.13
    1.07
     love
    1.01
    :
    0.98
     LOVE
    0.89
     by
    0.88
    </h4>
    0.81
     to
    0.80
    ää
    0.80
    0.80
    Act Density 0.042%

    No Known Activations