INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ید
    0.90
    بی
    0.81
    1
    0.76
    }\
    0.73
    0.68
     heart
    0.68
    0.66
     lukewarm
    0.66
    0.66
     heartbreaking
    0.65
    POSITIVE LOGITS
    k
    1.08
    0.99
    as
    0.96
    						
    0.88
    er
    0.86
    0.82
    supportsFocus
    0.76
    ا
    0.76
    an
    0.75
    0.75
    Act Density 0.000%

    No Known Activations