INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     revelations
    -0.07
     желуд
    -0.07
     ноя
    -0.07
    dfd
    -0.07
    IFI
    -0.06
    ollider
    -0.06
    ोर
    -0.06
    oyo
    -0.06
    jon
    -0.06
     освіти
    -0.06
    POSITIVE LOGITS
     capable
    0.12
     able
    0.10
     способ
    0.08
     incapable
    0.08
     composed
    0.07
     Able
    0.07
    0.07
    ,↵
    0.07
     disposed
    0.07
     susceptible
    0.06
    Act Density 0.014%

    No Known Activations