INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     یک
    -0.08
     Whit
    -0.07
     golf
    -0.07
     nh
    -0.07
     Golf
    -0.07
     Nir
    -0.07
     licz
    -0.07
     Tucson
    -0.06
     titanium
    -0.06
     основі
    -0.06
    POSITIVE LOGITS
     desperate
    0.33
     desperation
    0.22
    perate
    0.17
     desperately
    0.14
     desper
    0.10
     despair
    0.10
    peration
    0.08
     disparate
    0.07
    0.07
    보고
    0.07
    Act Density 0.004%

    No Known Activations