INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Magnet
    -0.07
    brane
    -0.07
     regul
    -0.06
    ultur
    -0.06
    -instagram
    -0.06
    اکم
    -0.06
     tecn
    -0.06
    .addr
    -0.06
     insurance
    -0.06
     cass
    -0.06
    POSITIVE LOGITS
     Boy
    0.29
    Boy
    0.20
    boy
    0.17
    -boy
    0.10
     Boys
    0.10
    boys
    0.10
     boy
    0.09
     Playboy
    0.08
     Dictionary
    0.07
    AY
    0.07
    Act Density 0.005%

    No Known Activations