INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    🏨
    -0.08
     trouve
    -0.07
    	Assert
    -0.07
    _fitness
    -0.07
     ENERGY
    -0.07
     полно
    -0.07
     Television
    -0.07
     poised
    -0.07
    -0.07
     encuentra
    -0.07
    POSITIVE LOGITS
    herited
    0.07
    فات
    0.07
     Parm
    0.07
    İLİ
    0.07
    apis
    0.07
    شبه
    0.07
     DEF
    0.07
    .scal
    0.06
     male
    0.06
     object
    0.06
    Act Density 0.007%

    No Known Activations