INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     decrease
    -1.43
    decrease
    -1.30
     Decrease
    -1.13
     decreased
    -1.06
    Decrease
    -1.02
     dependence
    -1.02
     decreases
    -1.00
    decre
    -0.92
     Dependence
    -0.92
     Decreased
    -0.90
    POSITIVE LOGITS
    ly
    0.90
    able
    0.79
    ably
    0.76
    ability
    0.73
     للمعارف
    0.63
    :+:
    0.63
    ities
    0.60
    tably
    0.59
    aires
    0.59
    LY
    0.58
    Act Density 0.444%

    No Known Activations