INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Is
    0.66
    Okay
    0.66
     وأ
    0.65
    essoas
    0.65
    It
    0.65
    Mus
    0.64
     والم
    0.64
    Can
    0.64
    A
    0.64
    Or
    0.62
    POSITIVE LOGITS
     efficient
    0.85
     detailed
    0.85
     effective
    0.82
     accurate
    0.82
     motivation
    0.81
     weighted
    0.80
     comprehensive
    0.80
     valid
    0.79
     integrated
    0.79
     rewarding
    0.78
    Act Density 0.005%

    No Known Activations